A quick followup, some reading here: http://www.sqlite.org/datatype3.html
provides some insight into how exactly sqlite3 stores values, I'm not completely sure that such a loose typing system will greatly benefit us when working with TEXT/STRING types, however, the gzipped blobs might benefit from less disk usage thanks to being stored in a single file, in addition, I know that incremental i/o is a possibility with blobs in sqlite 3.4, which could potentially be utilized to optimize work like this. Anyways, please send a patch to the list if thats not too much to ask, or just give us an update as to how things are going. Cheers, Kevin Kubasik On 10/1/07, Kevin Kubasik <[EMAIL PROTECTED]> wrote: > On 8/19/07, Arun Raghavan <[EMAIL PROTECTED]> wrote: > > Hello All, > > This week I've been working on the new TextCache implementation that > > I'd mentioned the last time (replacing the bunch of files with an > > Sqlite db). > > > > Making an Sqlite db with just the uri and raw text caused an almost 3x > > increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in > > my test case). This despite the fact that the size of the raw text was > > only 7.9 MB. I need to figure out why this happens. In the mean time, > > I also implemented another version of this which stores (uri, gzipped > > text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly, > > this actually seems to work very well (the db for the test case > > mentioned shrunk down to 2.6 MB, which is just a little more than the > > actual size of the compressed data itself). > My first impression on this is that Sqlite is probably building an > index for the raw text data. where as the compressed data is simply > treated as a binary 'glob'. I'm not 100% sure of the table definitions > that your using, or exactly how much (in terms of Indexes) sqlite does > automatically, but that seems like the most likely culprit. As we > already have our own system for searching text ;) if you could find a > way to force sqlite to not index the table's raw text column, you > could probably get more sane numbers regarding the database size. > However, its possible, its just how sqlite handles text content, and > the gzipped text is the best way to go. The other thing to test is how > this is handled in far larger situations. Is it possible that the > first 1000 rows are very expensive, but when we scale to 50000 rows, > we see only a minute increase in size? > > > > > Performance numbers on a search which returns 1205 results are below. > > I basically ran the measurements twice -- once after flushing the > > inode, dentry and page cache, and another time taking advantage of the > > disk caches. > > > > Current TextCache: > > no-disk-cache: ~1m > > with-disk-cache: ~9s > > > > New TextCache (raw and gzipped versions had similar numbers): > > no-disk-cache: ~42s > > with-disk-cache: ~10s > > > > Very cool/ interesting. One of the important cases to test here is > multiple successive queries. Think like deskbar as a user types > completion, how does such a system fair when it gets 15 or 20 queries > back to back. Does the compression difference factor in then? > > > One very important factor remains to be seen -- memory usage. I am > > working on figuring out what the impact of the new code on memory > > usage is. Numbers should be available soon. > > > > On the Xesam front, I will be updating the code tomorrow,day-after to > > reflect the latest changes to the spec. > > I know the Google SoC is over, and its completely ok if your too busy > to complete these tests, but if would be awesome if you could provide > a patch to the list so we can not only see exactly what you were > doing, but so that someone else might finish up your work and/or get > it merged in and ready for 0.3.0. > > > > -- > > Arun Raghavan > > > -- > Cheers, > Kevin Kubasik > http://kubasik.net/blog > -- Cheers, Kevin Kubasik http://kubasik.net/blog _______________________________________________ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers