eiri opened a new pull request #1983: More precise calculation of external 
docs' size
URL: https://github.com/apache/couchdb/pull/1983
 
 
   ## Overview
   
   We define a database's external size as:
   > Number of bytes that would be required to represent the contents outside 
of the database (for capacity and backup planning)
   
   Given that we are using JSON as a data serialization format on REST API I 
assume that `external` should represent a size of uncompressed JSON serialized 
data.
   
   However we are using `?term_size` for calculating doc's sizes and that gives 
this definition of `external` a noticeable overhead.
   
   Here is a demo:
   ```bash
   # movies.json contents a simplified imdb's movies info, i.e. reasonably 
complex json structure
   $ cat movies.json | jq .docs[2000]
   {
     "title": "The Love Light",
     "year": 1921,
     "cast": [
       "Mary Pickford",
       "Raymond Bloomer"
     ],
     "genres": [
       "Drama"
     ]
   }
   
   $ stat -f%z movies.json 
   3386172 
   
   $ curl -X PUT http://127.0.0.1:15984/test -G -d n=1 q=1
   {"ok":true}
   
   $ curl -X POST http://127.0.0.1:15984/test/_bulk_docs --data @movies.json > 
/dev/null
   
   $ curl http://127.0.0.1:15984/test | jq .sizes.external
   5856324
   ```
   
   So `external` here gives ~58% overhead over "pure" JSON data.
   
   I've switched from `?term_size` to `couch_ejson_size:encoded_size/1` for 
calculating external size and this returns much closer values. For the same 
test:
   ```bash
   $ curl http://127.0.0.1:15984/test | jq .sizes.external
   3357139
   ```
   
   Another change is a switch from recalculation of external size during 
compaction to carrying it from leaf's info, so `external` size doesn't change 
after compaction:
   ```bash
   $ curl -X POST http://127.0.0.1:15984/test/_compact
   {"ok":true}
   $ curl http://127.0.0.1:15984/test | jq .sizes.external
   3357139
   ```
   
   ## Testing recommendations
   
   All the existing `couch` app's test should pass. The manual test could be 
done following the steps in the example above.
   
   ## Checklist
   
   - [x] Code is written and works correctly;
   - [x] Changes are covered by tests;
   - [ ] Documentation reflects the changes;
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to