-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Roger Binns wrote: > I'll do the requisite experiments this weekend trying to see what has the > most effect on file size
And the answer is length. It is quicker to add documents with sequential (sorted) _ids. The length of the _id field has an effect on the final file size and appears to be more than a multiple of the _id size as suggested in earlier messages. Somewhat amusingly compaction increased file sizes and not by a trivial amount either. To measure this, I wrote a simple Python script that created 65536 documents with a 4 byte hex id, and then tried again padding the _id with zeros to get 8 and 16 byte, plus doing various other permutations. It is an embarrassingly small script (and likely just as small in other languages). [Sorry for not publishing the script - BitBucket and I are having some mutual hatred issues at the moment.] The relationship between _id size, sparseness, file size and performance is now better approached by someone with an understanding of the file format. I've also started this page to help: http://wiki.apache.org/couchdb/Performance Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAktKhfMACgkQmOOfHg372QTlTQCdEawiNcqJVtHOjK61OsQNhtd+ P2gAn1gVXeknm4mfU74RlZid1+kI59dh =RPB7 -----END PGP SIGNATURE-----