+1 to Adam's definition, which I think is closest to the "former" definition in 
Eric's first post.

-Joan

----- Original Message -----
From: "Adam Kocoloski" <kocol...@apache.org>
To: "dev@couchdb.apache.org Developers" <dev@couchdb.apache.org>
Sent: Monday, October 22, 2018 5:13:05 PM
Subject: Re: Exact definition of a database "active size"

I think sizes.active should be a close approximation of the size of the 
database after compaction; i.e. it should be possible to use (sizes.file - 
sizes.active) as a way to estimate the number of bytes that can be reclaimed by 
compacting that database shard.

Adam

> On Oct 22, 2018, at 4:32 PM, Eiri <e...@eiri.ca> wrote:
> 
> Dear all,
> 
> I’d like to hear your opinion on how we should interpret a database attribute 
> “active size”.
> 
> As you surely know we are using three different size attributes in a database 
> info: file - the size of the database file on disk; external - the 
> uncompressed size of database contents and active, defined as “the size of 
> live data inside the database” or “active byte in the current MVCC snapshot”.
> 
> Sometime ago I had a discussion with Paul Davis and he pointed on ambiguity 
> of that definition, namely - is it live data before a compaction or after a 
> compaction? To put it in other words: should we treat as “active” only the 
> documents and attachments on btree’s leafs or also include into it the 
> previous document revisions while they can be accessed. Codewise it is the 
> latter, both in current version of CouchDB and in 1.x version where active 
> size was named data_size, but intuitively it feels that it should be former.
> 
> Despite sounds academical this is a practical question, the difference of 
> active size before and after compaction could be rather noticeable and since 
> it is used as a trigger by compaction daemon it could skew disk usage pattern.
> 
> Please share your thoughts. If we’ll conclude that we want to change how 
> active size calculated I’m willing to take on implementation of this as I 
> have a recent PR around the same area of code.
> 
> 
> Regards,
> Eric
> 
> 
> 
> 
> 
> 
> 

Reply via email to