On 8/19/07, Arun Raghavan <[EMAIL PROTECTED]> wrote: > Hello All, > This week I've been working on the new TextCache implementation that > I'd mentioned the last time (replacing the bunch of files with an > Sqlite db). > > Making an Sqlite db with just the uri and raw text caused an almost 3x > increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in > my test case). This despite the fact that the size of the raw text was > only 7.9 MB. I need to figure out why this happens. In the mean time, > I also implemented another version of this which stores (uri, gzipped > text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly, > this actually seems to work very well (the db for the test case > mentioned shrunk down to 2.6 MB, which is just a little more than the > actual size of the compressed data itself). My first impression on this is that Sqlite is probably building an index for the raw text data. where as the compressed data is simply treated as a binary 'glob'. I'm not 100% sure of the table definitions that your using, or exactly how much (in terms of Indexes) sqlite does automatically, but that seems like the most likely culprit. As we already have our own system for searching text ;) if you could find a way to force sqlite to not index the table's raw text column, you could probably get more sane numbers regarding the database size. However, its possible, its just how sqlite handles text content, and the gzipped text is the best way to go. The other thing to test is how this is handled in far larger situations. Is it possible that the first 1000 rows are very expensive, but when we scale to 50000 rows, we see only a minute increase in size?
> > Performance numbers on a search which returns 1205 results are below. > I basically ran the measurements twice -- once after flushing the > inode, dentry and page cache, and another time taking advantage of the > disk caches. > > Current TextCache: > no-disk-cache: ~1m > with-disk-cache: ~9s > > New TextCache (raw and gzipped versions had similar numbers): > no-disk-cache: ~42s > with-disk-cache: ~10s > Very cool/ interesting. One of the important cases to test here is multiple successive queries. Think like deskbar as a user types completion, how does such a system fair when it gets 15 or 20 queries back to back. Does the compression difference factor in then? > One very important factor remains to be seen -- memory usage. I am > working on figuring out what the impact of the new code on memory > usage is. Numbers should be available soon. > > On the Xesam front, I will be updating the code tomorrow,day-after to > reflect the latest changes to the spec. I know the Google SoC is over, and its completely ok if your too busy to complete these tests, but if would be awesome if you could provide a patch to the list so we can not only see exactly what you were doing, but so that someone else might finish up your work and/or get it merged in and ready for 0.3.0. > -- > Arun Raghavan -- Cheers, Kevin Kubasik http://kubasik.net/blog _______________________________________________ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers