Zack Weinberg schrieb: > It occurred to me that we store a lot of SHA1 hashes in our databases > and they're all twice as big as they need to be because they're in > hex.
I added a project like this to the summit projects and really second the move (I coded the base64->binary move in the past). > I'm not sure whether this means we actually want to *do* this for > real. It will make manual database queries have more binary garbage > in them; there are a lot of places in the code that will have to > change; we'll have to jump through hoops in a few places to get the > hashes to stay the same; we probably don't want to do this to the > netsync protocol, so there will be more conversions to do. Still, > nearly 10% disk space savings is not to sneeze at, and I bet there > would be speed gains too, just from not having to read so much off the > disk. Having to write x'abcdef' instead of 'abcdef' is not that that much overhead IMHO. Having to write quote(id) hurts a bit, perhaps mtn exec sql should default to output BLOBs quoted. > There is another factor to consider. There are 217,055 hashes in the > "mtn.ids" file; however, there are only 91,223 *unique* hashes. (This > is because many of the hashes are used as pointers between tables.) > The ratio is similar for OE.ids. Thus, it might be worthwhile to yank > all the hashes out into a separate table and reference them by row > number from the rest of the database. Depending on how sqlite decides > to do things, this might be a *lot* better, as we could use INTEGER > PRIMARY KEYs in a whole bunch of tables where we currently have string > keys. Technically this is orthogonal to the idea of storing the > hashes as raw data, but it might be enough of a gain by itself that we > don't want to bother with the de-hex-ificcation too (and, while the > code changes for it would be substantial, I think they'd also be in > fewer places). A good thing to talk about on the summit. E.g. revision_certs could easily refer to revion[_delta]s. Storing delta and plain objects in one table (plain indicated by a NULL base) might be a good idea to disambiguate the key and simplify queries. Christof _______________________________________________ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel