This is an automated email from the ASF dual-hosted git repository. wohali pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/couchdb-documentation.git
commit f3e1ce497a6e7a92957b7141c6be5291966201e0 Author: Joan Touzet <jo...@atypical.net> AuthorDate: Mon Dec 17 17:50:08 2018 -0500 Migrate stats aggregation howto from MoinMoin --- src/best-practices/documents.rst | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/src/best-practices/documents.rst b/src/best-practices/documents.rst index 7375cff..991731c 100644 --- a/src/best-practices/documents.rst +++ b/src/best-practices/documents.rst @@ -48,3 +48,24 @@ to ensure unique identifiers for each row in a database table. CouchDB generates unique ids on its own and you can specify your own as well, so you don't really need a sequence here. If you use a sequence for something else, you will be better off finding another way to express it in CouchDB in another way. + +Pre-aggregating your data +------------------------- + +If your intent for CouchDB is as a collect-and-report model, not a real-time view, +you may not need to store a single document for every event you're recording. +In this case, pre-aggregating your data may be a good idea. You probably don't +need 1000 documents per second if all you are trying to do is to track +summary statistics about those documents. This reduces the computational pressure +on CouchDB's MapReduce engine(s), as well as reduces its storage requirements. + +In this case, using an in-memory store to summarize your statistical information, +then writing out to CouchDB every 10 seconds / 1 minute / whatever level of +granularity you need would greatly reduce the number of documents you'll put in +your database. + +Later, you can then further `decimate +<https://en.wikipedia.org/wiki/Downsampling_(signal_processing)>`_ your data by +walking the entire database and generating documents to be stored in a new +database with a lower level of granularity (say, 1 document a day). You can then +delete the older, more fine-grained database when you're done with it.