-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Chris Anderson wrote: >> You might try using CouchDB's builtin sequential "uuids". These should >> give you some more storage efficiency.
I can't as various items reference various other items. The source data is in a SQLite database - highly normalized. I then generate couch documents from that are effectively denormalized using a SQLite temporary table to help with mapping SQL primary keys of various tables into CouchDB document ids. >> Thanks for reporting. The last time we tried, we were able to do make >> some major progress in storage size efficiency. I'm not sure how much >> low-hanging fruit we have left, but if you try sequential uuids that >> would be a good start. Using _ids that were maximum 4 bytes long resulted in *massively* less CouchDB storage consumption. Here are the numbers, first column is Gigabytes. There are 9.8 million couchdb documents. None of the CouchDB databases have any views added (yet). 1.3 SQLite database (53 tables, raw data) 2.3 SQLite database after adding indices 2.5 Couch objects, JSON, one per line text file (no _rev, 16 byte _id) 21.4 Those same couch objects after compaction (saved ~2GB doing compaction) 2.0 Couch objects, JSON, one per line text file (no _rev, 4 byte _id) 3.8 Those same couch objects after compaction (didn't note pre-compaction) Doing compaction on the 21GB database took about 24 hours. Doing it on the 3.8 gb database took about 30 mins and probably way less. The machine has 6GB ram. (It also doesn't help that compaction does frequent fsync's - they really are not needed.) What this shows is that that CouchDB storage efficiency is *highly* correlated with _id size. Rather than using 16 hex digits you could get the same number of values but using base 62 (10 digits, 26 lower, 26 upper) and only need 4 or so of those "digits". Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAktGUZcACgkQmOOfHg372QTRjwCdHqiFHqBzNCYAR/DFbsNNq4PV 3aIAn2/AOBTAWrejsYa8XIyiBoUfe8/R =m7Pn -----END PGP SIGNATURE-----