Hi,

I agree with all of this.

On "sizes", we should clean up the various places that the different sizes are 
reported. I suggest we stick with just the "sizes" object, which will have two 
items, 'external' which will be jiffy's estimate of the body as json plus the 
length of all attachments (only if held within fdb) and 'file' which will be 
the sum of the lengths of the keys and values in fdb for the Directory 
(excluding the sum key/value itself). (the long way of saying I agree with what 
you already said).

On "offset", I agree we should remove it. It's of questionable value today, so 
let's call it out as an API change in the appropriate RFC section. The fdb 
release (ostensibly "4.0") is an opportunity to clean up some API cruft. Given 
we know about this one early, we should also remove it in 3.0.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Mon, 8 Apr 2019, at 23:33, Adam Kocoloski wrote:
> Hi all, a recent comment from Paul on the revision model RFC reminded 
> me that we should have a discussion on how we maintain aggregate 
> statistics about databases stored in FoundationDB. I’ll ignore the 
> statistics associated with secondary indexes for the moment, assuming 
> that the design we put in place for document data can serve as the 
> basis for an extension there.
> 
> The first class of statistics are the ones we report in GET /<dbname>, 
> which are documented here:
> 
> http://docs.couchdb.org/en/stable/api/database/common.html#get--db
> 
> These fall into a few different classes:
> 
> doc_count, doc_del_count: these should be maintained using 
> FoundationDB’s atomic operations. The revision model RFC enumerated all 
> the possible update paths and showed that we always have enough 
> information to know whether to increment or decrement each of these 
> counters; i.e., we always know when we’re removing the last 
> deleted=false branch, adding a new branch to a previously-deleted 
> document, etc.
> 
> update_seq: this must _not_ be maintained as its own key; attempting to 
> do so would cause every write to the database to conflict with every 
> other write and kill throughput. Rather, we can do a limit=1 range read 
> on the end of the ?CHANGES space to retrieve the current sequence of 
> the database.
> 
> sizes.*: things get a little weird here. Historically we relied on the 
> relationship between sizes.active and sizes.file to know when to 
> trigger a database compaction, but we don’t yet have a need for 
> compaction in the FDB-based data model and it’s not clear how we should 
> define these two quantities. The sizes.external field has also been a 
> little fuzzy. Ignoring the various definitions of “size” for the 
> moment, let’s agree that we’ll want to be tracking some set of byte 
> counts for each database. I think the way we should do this is by 
> extending the information stored in each edit branch in ?REVISIONS to 
> included the size(s) of the current revision. When we update a document 
> we need to compare the size(s) of the new revision with the size(s) of 
> the parent, and update the database level atomic counter(s) 
> appropriately. This requires an enhancement to RFC 001.
> 
> I’d like to further propose that we track byte counts not just at a 
> database level but also across the entire Directory associated with a 
> single CouchDB deployment, so that FoundationDB administrators managing 
> multiple applications for a single cluster can have a better view of 
> per-Directory resource utilization without walking every single 
> database stored inside.
> 
> Looking past the DB info endpoint, one other statistic worth discussing 
> is the “offset” field included with every response to an _all_docs 
> request. This is not something that we get for free in FoundationDB, 
> and I have to confess it seems to be of limited utility. We could 
> support this by implementing a tree structure by adding additional 
> aggregation keys on top of the keys stored in the _all_docs space, but 
> I’m skeptical that it’s worth baking this extra cost into every 
> database update and _all_docs operation. I’d like to hear others’ 
> thoughts on this one.
> 
> I haven’t yet looked closely at _stats and _system to see if any of 
> those metrics require specific support from FDB.
> 
> Adam

Reply via email to