-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Anderson wrote:
>> You might try using CouchDB's builtin sequential "uuids". These should
>> give you some more storage efficiency.

I can't as various items reference various other items.  The source data is
in a SQLite database - highly normalized.  I then generate couch documents
from that are effectively denormalized using a SQLite temporary table to
help with mapping SQL primary keys of various tables into CouchDB document ids.

>> Thanks for reporting. The last time we tried, we were able to do make
>> some major progress in storage size efficiency. I'm not sure how much
>> low-hanging fruit we have left, but if you try sequential uuids that
>> would be a good start.

Using _ids that were maximum 4 bytes long resulted in *massively* less
CouchDB storage consumption.  Here are the numbers, first column is
Gigabytes.  There are 9.8 million couchdb documents.  None of the CouchDB
databases have any views added (yet).

 1.3 SQLite database (53 tables, raw data)
 2.3 SQLite database after adding indices

 2.5 Couch objects, JSON, one per line text file (no _rev, 16 byte _id)
21.4 Those same couch objects after compaction (saved ~2GB doing compaction)

 2.0 Couch objects, JSON, one per line text file (no _rev, 4 byte _id)
 3.8 Those same couch objects after compaction (didn't note pre-compaction)

Doing compaction on the 21GB database took about 24 hours.  Doing it on the
3.8 gb database took about 30 mins and probably way less.  The machine has
6GB ram.  (It also doesn't help that compaction does frequent fsync's - they
really are not needed.)

What this shows is that that CouchDB storage efficiency is *highly*
correlated with _id size.  Rather than using 16 hex digits you could get the
same number of values but using base 62 (10 digits, 26 lower, 26 upper) and
only need 4 or so of those "digits".

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktGUZcACgkQmOOfHg372QTRjwCdHqiFHqBzNCYAR/DFbsNNq4PV
3aIAn2/AOBTAWrejsYa8XIyiBoUfe8/R
=m7Pn
-----END PGP SIGNATURE-----

Reply via email to