-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Roger Binns wrote:
> I'll do the requisite experiments this weekend trying to see what has the
> most effect on file size 

And the answer is length.  It is quicker to add documents with sequential
(sorted) _ids.  The length of the _id field has an  effect on the final file
size and appears to be more than a multiple of the _id size as suggested in
earlier messages.  Somewhat amusingly compaction increased file sizes and
not by a trivial amount either.

To measure this, I wrote a simple Python script that created 65536 documents
with a 4 byte hex id, and then tried again padding the _id with zeros to get
8 and 16 byte, plus doing various other permutations.  It is an
embarrassingly small script (and likely just as small in other languages).
[Sorry for not publishing the script - BitBucket and I are having some
mutual hatred issues at the moment.]

The relationship between _id size, sparseness, file size and performance is
now better approached by someone with an understanding of the file format.

I've also started this page to help:

  http://wiki.apache.org/couchdb/Performance

Roger

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktKhfMACgkQmOOfHg372QTlTQCdEawiNcqJVtHOjK61OsQNhtd+
P2gAn1gVXeknm4mfU74RlZid1+kI59dh
=RPB7
-----END PGP SIGNATURE-----

Reply via email to