Re: beagle r4019 - in trunk/beagle: Util beagled beagled/DownloadsMetadataQueryable

2007-10-08 Thread Kevin Kubasik
Good call, I should have done this myself before leaving for school
(since I knew I would be disconnected for some time) but I got
lazy/busy/forgot. Anyways, thanks much, and sounds good. (some of the
other RDF-type discussion is looking promising and when I get a good
chunk of free time in the next 2 or 3 weeks, I wanna give more time to
that.) Anyways, I should be getting my Comcast connection installed
Wed (but again, no promises) if that happens I'll be more
in-touch/active.

Cheers,
Kevin Kubasik

On 10/7/07, Debajyoti Bera [EMAIL PROTECTED] wrote:
 Hi Kevin,

   Just a passing comment, I for some reason dont feel very comfortable with
   this backend. Its an uneasy feeling, probably because of the fact that
   the metadata (i.e. the origin of a file) information is not available
   in the file itself nor there is a guarantee that the metadata is always
   correct.
 
  And, again, is something that probably should have been prototyped
  first on a branch, or a bugzilla patch.

 I agree with Joe on this one - I moved the changes to
 beagle-external-metadata-tags branch. The problem has significant overlap
 with that of implementing any external-metadata backend, including tags. Lets
 think about the problem and get it fixed in the branch.

 - dBera

 --
 -
 Debajyoti Bera @ http://dtecht.blogspot.com
 beagle / KDE fan
 Mandriva / Inspiron-1100 user



-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-08 Thread Debajyoti Bera
Hi,

First the context of this discussion: better storing of cached data (aka 
textcache).

 Very cool, and good to hear. If Arun could share a patch for his
 implementation, that would be awesome in terms of preventing wheel
 reinvention ;) If Arun is unable, or doesn't have the time to look
 into a hybrid solution, I wouldn't mind doing some investigative work,
  I think the biggest decision comes when its time to determine what
 our cutoff is, (size wise). While there is a little extra complication
 introduced by a hybrid system, I don't see it being a major  issue to
 implement. My thought would just be to have a table in the
 TextCache.db which denotes if a uri is stored in db or on disk. The
 major concern is the cost of 2 sqlite queries per cache item.

 Just my thoughts on the subject. DBera: are you saying that you want
 to just work/look into the language stemming, or both the language
 stemming and the text cache? Depending on what you want to work on, I
 can help out with this, if its something we really want to see in
 0.3.0. Lemme know.
   completely sure that such a loose typing system will greatly benefit
   us when working with TEXT/STRING types, however, the gzipped blobs
   might benefit from less disk usage thanks to being stored in a single
   file, in addition, I know that incremental i/o is a possibility with
   blobs in sqlite 3.4, which could potentially be utilized to optimize
   work like this.
  
   Anyways, please send a patch to the list if thats not too much to ask,
   or just give us an update as to how things are going.
 
  I and Arun had some discussion about this and we were trying to balance
  the performance and size issues. He already has the sqlite-idea
  implemented; however I would also like to see how a hybrid idea works
  i.e. store the huge number of extremely small files in sqlite and store
  the really large ones on the disk. Implementing this is tricky.

I just checked in some changes implementing the above hybrid idea. Currently, 
any file less than 4K gzipped is an extremely small file (stored in db) and 
anything more is a really large one (stored on disk). The cutoff is 
hardcoded in TextCache.cs/BLOB_SIZE_LIMIT The number of files and the disk 
size of .beagle/TextCache reduces significantly. Performance and memory 
should not suffer noticably unless I did something stupid.

One thing I forgot to test was support for sqlite-2. Could anyone with 
sqlite-2 sync svn trunk and see if things work as expected ? .beagle/ might 
need to be deleted and files/emails re-indexed.

In the past, I emailed how this feature relates to language determination. It 
still does but that would require some more work (hint: somehow merge 
TextCacheWriteStream and PullingReader) and a significant bit of testing. I 
have no plans on working on it now.

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers