eiri opened a new pull request #1983: More precise calculation of external docs' size URL: https://github.com/apache/couchdb/pull/1983 ## Overview We define a database's external size as: > Number of bytes that would be required to represent the contents outside of the database (for capacity and backup planning) Given that we are using JSON as a data serialization format on REST API I assume that `external` should represent a size of uncompressed JSON serialized data. However we are using `?term_size` for calculating doc's sizes and that gives this definition of `external` a noticeable overhead. Here is a demo: ```bash # movies.json contents a simplified imdb's movies info, i.e. reasonably complex json structure $ cat movies.json | jq .docs[2000] { "title": "The Love Light", "year": 1921, "cast": [ "Mary Pickford", "Raymond Bloomer" ], "genres": [ "Drama" ] } $ stat -f%z movies.json 3386172 $ curl -X PUT http://127.0.0.1:15984/test -G -d n=1 q=1 {"ok":true} $ curl -X POST http://127.0.0.1:15984/test/_bulk_docs --data @movies.json > /dev/null $ curl http://127.0.0.1:15984/test | jq .sizes.external 5856324 ``` So `external` here gives ~58% overhead over "pure" JSON data. I've switched from `?term_size` to `couch_ejson_size:encoded_size/1` for calculating external size and this returns much closer values. For the same test: ```bash $ curl http://127.0.0.1:15984/test | jq .sizes.external 3357139 ``` Another change is a switch from recalculation of external size during compaction to carrying it from leaf's info, so `external` size doesn't change after compaction: ```bash $ curl -X POST http://127.0.0.1:15984/test/_compact {"ok":true} $ curl http://127.0.0.1:15984/test | jq .sizes.external 3357139 ``` ## Testing recommendations All the existing `couch` app's test should pass. The manual test could be done following the steps in the example above. ## Checklist - [x] Code is written and works correctly; - [x] Changes are covered by tests; - [ ] Documentation reflects the changes;
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
