Very cool, and good to hear. If Arun could share a patch for his implementation, that would be awesome in terms of preventing wheel reinvention ;) If Arun is unable, or doesn't have the time to look into a hybrid solution, I wouldn't mind doing some investigative work, I think the biggest decision comes when its time to determine what our cutoff is, (size wise). While there is a little extra complication introduced by a hybrid system, I don't see it being a major issue to implement. My thought would just be to have a table in the TextCache.db which denotes if a uri is stored in db or on disk. The major concern is the cost of 2 sqlite queries per cache item.
Just my thoughts on the subject. DBera: are you saying that you want to just work/look into the language stemming, or both the language stemming and the text cache? Depending on what you want to work on, I can help out with this, if its something we really want to see in 0.3.0. Lemme know. Cheers, Kevin Kubasik On 10/2/07, Debajyoti Bera <[EMAIL PROTECTED]> wrote: > > completely sure that such a loose typing system will greatly benefit > > us when working with TEXT/STRING types, however, the gzipped blobs > > might benefit from less disk usage thanks to being stored in a single > > file, in addition, I know that incremental i/o is a possibility with > > blobs in sqlite 3.4, which could potentially be utilized to optimize > > work like this. > > > > Anyways, please send a patch to the list if thats not too much to ask, > > or just give us an update as to how things are going. > > I and Arun had some discussion about this and we were trying to balance the > performance and size issues. He already has the sqlite-idea implemented; > however I would also like to see how a hybrid idea works i.e. store the huge > number of extremely small files in sqlite and store the really large ones on > the disk. Implementing this is tricky (*). > > - dBera > > (*) One of my recent efforts has been to add language detection support (based > on a patch in bugzilla). This will enable us to use the right stemmers and > analyzers depending on the language. The hard part is stealing some initial > text for language detection and doing it in a transparent way. Incidentally, > one implementation of the hybird approach mentioned above and the language > detection crosses path. I am waiting for some free time to get going after > them. > > -- > ----------------------------------------------------- > Debajyoti Bera @ http://dtecht.blogspot.com > beagle / KDE fan > Mandriva / Inspiron-1100 user > -- Cheers, Kevin Kubasik http://kubasik.net/blog _______________________________________________ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers