Hi,
First the context of this discussion: better storing of cached data (aka
textcache).
Very cool, and good to hear. If Arun could share a patch for his
implementation, that would be awesome in terms of preventing wheel
reinvention ;) If Arun is unable, or doesn't have the time to look
into a hybrid solution, I wouldn't mind doing some investigative work,
I think the biggest decision comes when its time to determine what
our cutoff is, (size wise). While there is a little extra complication
introduced by a hybrid system, I don't see it being a major issue to
implement. My thought would just be to have a table in the
TextCache.db which denotes if a uri is stored in db or on disk. The
major concern is the cost of 2 sqlite queries per cache item.
Just my thoughts on the subject. DBera: are you saying that you want
to just work/look into the language stemming, or both the language
stemming and the text cache? Depending on what you want to work on, I
can help out with this, if its something we really want to see in
0.3.0. Lemme know.
completely sure that such a loose typing system will greatly benefit
us when working with TEXT/STRING types, however, the gzipped blobs
might benefit from less disk usage thanks to being stored in a single
file, in addition, I know that incremental i/o is a possibility with
blobs in sqlite 3.4, which could potentially be utilized to optimize
work like this.
Anyways, please send a patch to the list if thats not too much to ask,
or just give us an update as to how things are going.
I and Arun had some discussion about this and we were trying to balance
the performance and size issues. He already has the sqlite-idea
implemented; however I would also like to see how a hybrid idea works
i.e. store the huge number of extremely small files in sqlite and store
the really large ones on the disk. Implementing this is tricky.
I just checked in some changes implementing the above hybrid idea. Currently,
any file less than 4K gzipped is an extremely small file (stored in db) and
anything more is a really large one (stored on disk). The cutoff is
hardcoded in TextCache.cs/BLOB_SIZE_LIMIT The number of files and the disk
size of .beagle/TextCache reduces significantly. Performance and memory
should not suffer noticably unless I did something stupid.
One thing I forgot to test was support for sqlite-2. Could anyone with
sqlite-2 sync svn trunk and see if things work as expected ? .beagle/ might
need to be deleted and files/emails re-indexed.
In the past, I emailed how this feature relates to language determination. It
still does but that would require some more work (hint: somehow merge
TextCacheWriteStream and PullingReader) and a significant bit of testing. I
have no plans on working on it now.
- dBera
--
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers