Migrate to Mono.Data.Sqlite (Was: Re: GSoC Weekly Report)
> Ignore my previous email ... I was looking at the wrong place :( > This is the right place for the new M.D.Sqlite > http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/Mono.Data.Sqlite/Mo >no.Data.Sqlite_2.0/SQLiteDataReader.cs Migration from Mono.Data.SqliteClient to Mono.Data.Sqlite completed (rev 4061). -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
Ignore my previous email ... I was looking at the wrong place :( This is the right place for the new M.D.Sqlite http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/Mono.Data.Sqlite/Mono.Data.Sqlite_2.0/SQLiteDataReader.cs - dBera -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
> > A followup question, I didnot find any API documentation of > > Mono.Data.Sqlite :( #mono was also sleeping when I asked the question > > there. > > My understanding is that both M.D.SqliteClient and M.D.Sqlite follow > the general ADO.Net API patterns and that the latter is more or less a > drop-in replacement for the former. A few things may need to be > tweaked, but in general just changing the "using" statements at the > top of each source file should be all that's needed. I was more looking for some method for row-by-row retrieval, on demand. Real on-demand, where the implementation does not retrieve all the rows at once but returns one by one. > You've always been able to get rows on demand via ADO.Net, it's just a > matter of the implementation underneath. The old one (not modified by > us) would load all of them into memory. I'm not sure how the new one > performs memory-wise. If the Mono guys don't have any idea, the right I checked the source out of curiousity http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/Mono.Data.Sqlite/Mono.Data.Sqlite/ And the code for DataReader looks exactly the same (didnt do a diff, just visually) as the one in Mono.Data.SqliteClient. So even if we migrate (the migration would be easy), we still have to ship with a modified inhouse M.D.Sqlite and keep syncing in with upstream. *sigh* - dBera -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
Hi, On 10/16/07, D Bera <[EMAIL PROTECTED]> wrote: > A followup question, I didnot find any API documentation of > Mono.Data.Sqlite :( #mono was also sleeping when I asked the question > there. My understanding is that both M.D.SqliteClient and M.D.Sqlite follow the general ADO.Net API patterns and that the latter is more or less a drop-in replacement for the former. A few things may need to be tweaked, but in general just changing the "using" statements at the top of each source file should be all that's needed. > If M.D.Sqlite does not have a way to return rows on demand, I > am against the migration. In the worst case, we can ship with a > modified copy of M.D.Sqlite but I am not sure what will that buy us. You've always been able to get rows on demand via ADO.Net, it's just a matter of the implementation underneath. The old one (not modified by us) would load all of them into memory. I'm not sure how the new one performs memory-wise. If the Mono guys don't have any idea, the right thing to do here would be to create a large test database (or use an existing TextCache or FAStore db) and do a "SELECT *" using the 3 implementations and walk the results, using heap-buddy and/or heap-shot to analyze their memory usage. > In the same breath, what is the benefit of M.D.Sqlite over > M.D.SqliteClient for beagle ? I figured out there are some ADO.Net > advantages but other than that ... ? It's maintained for one, which our modified one essentially isn't. It has the backing of the Mono team. The code is much cleaner and easier to understand, largely because it doesn't have two separate codepaths (one for v2 and one for v3). I am sure the Mono guys have other good reasons too. :) Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
> Indeed you're right, but those changes did get merged upstream. So > the memory usage I believe is the only outstanding reason. Sweet. A followup question, I didnot find any API documentation of Mono.Data.Sqlite :( #mono was also sleeping when I asked the question there. If M.D.Sqlite does not have a way to return rows on demand, I am against the migration. In the worst case, we can ship with a modified copy of M.D.Sqlite but I am not sure what will that buy us. In the same breath, what is the benefit of M.D.Sqlite over M.D.SqliteClient for beagle ? I figured out there are some ADO.Net advantages but other than that ... ? - dBera -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
Hi, On 10/16/07, Debajyoti Bera <[EMAIL PROTECTED]> wrote: > > > What to do with our local changes to Mono.Data.SqliteClient ? I always > > > get confused with them. Dont even know what are those changes and why are > > > they there :-/ (it has something to with threading and locking) ? > > > > The work done locally was mainly for memory usage reasons. IIRC, the > > upstream bindings pull all of the results into memory at once, whereas > > our locally modified ones do so only as needed. I don't think > > threading/locking was ever an issue -- you might be confusing it with > > the fact that we couldn't use early sqlite 3.x versions because of > > broken policy in the library to that effect. > > Probably you are right. I still had to verify ... > beagle:/source=mind?query=sqlite+beagle+lock > returned nothing :-D > but google returned, > http://lists.ximian.com/pipermail/mono-devel-list/2005-November/015977.html > which mentions "Lock" ... yay! My faith in my memory is restored ;-) Indeed you're right, but those changes did get merged upstream. So the memory usage I believe is the only outstanding reason. Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
> > What to do with our local changes to Mono.Data.SqliteClient ? I always > > get confused with them. Dont even know what are those changes and why are > > they there :-/ (it has something to with threading and locking) ? > > The work done locally was mainly for memory usage reasons. IIRC, the > upstream bindings pull all of the results into memory at once, whereas > our locally modified ones do so only as needed. I don't think > threading/locking was ever an issue -- you might be confusing it with > the fact that we couldn't use early sqlite 3.x versions because of > broken policy in the library to that effect. Probably you are right. I still had to verify ... beagle:/source=mind?query=sqlite+beagle+lock returned nothing :-D but google returned, http://lists.ximian.com/pipermail/mono-devel-list/2005-November/015977.html which mentions "Lock" ... yay! My faith in my memory is restored ;-) - dBera -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
Hi, On 10/13/07, Debajyoti Bera <[EMAIL PROTECTED]> wrote: > What to do with our local changes to Mono.Data.SqliteClient ? I always get > confused with them. Dont even know what are those changes and why are they > there :-/ (it has something to with threading and locking) ? The work done locally was mainly for memory usage reasons. IIRC, the upstream bindings pull all of the results into memory at once, whereas our locally modified ones do so only as needed. I don't think threading/locking was ever an issue -- you might be confusing it with the fact that we couldn't use early sqlite 3.x versions because of broken policy in the library to that effect. I'm not sure what the memory side effects of the newer upstream bindings are. Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
> Sorry, I was unclear. By "removing sqlite2" I meant simply removing > it as an option from configure.in and requiring only sqlite3, not > removing the codepaths from the cut-and-pasted code. Then, at some > point in the future, porting over to Mono's own Mono.Data.Sqlite. What to do with our local changes to Mono.Data.SqliteClient ? I always get confused with them. Dont even know what are those changes and why are they there :-/ (it has something to with threading and locking) ? - dBera PS: Mannn... I love these Liberation fonts... can't stop reading the same mail ten times :P -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
Hi, On 10/9/07, D Bera <[EMAIL PROTECTED]> wrote: > > At this point, I'm in favor of dropping support for sqlite2 entirely > > anyway. That will make a migration to the new Mono sqlite bindings > > smoother, and drop a nasty chunk of cut-and-paste-and-patch code in > > the tree. > > Me too, me too ... > But I see no point in the double effort in once removing sqlite-2 > support and then changing the code to use mono.data.sqlite. Any > volunteers for the cleanup ? Sorry, I was unclear. By "removing sqlite2" I meant simply removing it as an option from configure.in and requiring only sqlite3, not removing the codepaths from the cut-and-pasted code. Then, at some point in the future, porting over to Mono's own Mono.Data.Sqlite. Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
> > One thing I forgot to test was support for sqlite-2. Could anyone with > > sqlite-2 sync svn trunk and see if things work as expected ? .beagle/ might > > need to be deleted and files/emails re-indexed. > > At this point, I'm in favor of dropping support for sqlite2 entirely > anyway. That will make a migration to the new Mono sqlite bindings > smoother, and drop a nasty chunk of cut-and-paste-and-patch code in > the tree. Me too, me too ... But I see no point in the double effort in once removing sqlite-2 support and then changing the code to use mono.data.sqlite. Any volunteers for the cleanup ? - dBera -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
Hi, On 10/8/07, Debajyoti Bera <[EMAIL PROTECTED]> wrote: > One thing I forgot to test was support for sqlite-2. Could anyone with > sqlite-2 sync svn trunk and see if things work as expected ? .beagle/ might > need to be deleted and files/emails re-indexed. At this point, I'm in favor of dropping support for sqlite2 entirely anyway. That will make a migration to the new Mono sqlite bindings smoother, and drop a nasty chunk of cut-and-paste-and-patch code in the tree. Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
Hi, First the context of this discussion: better storing of cached data (aka textcache). > Very cool, and good to hear. If Arun could share a patch for his > implementation, that would be awesome in terms of preventing wheel > reinvention ;) If Arun is unable, or doesn't have the time to look > into a hybrid solution, I wouldn't mind doing some investigative work, > I think the biggest decision comes when its time to determine what > our cutoff is, (size wise). While there is a little extra complication > introduced by a hybrid system, I don't see it being a major issue to > implement. My thought would just be to have a table in the > TextCache.db which denotes if a uri is stored in db or on disk. The > major concern is the cost of 2 sqlite queries per cache item. > > Just my thoughts on the subject. DBera: are you saying that you want > to just work/look into the language stemming, or both the language > stemming and the text cache? Depending on what you want to work on, I > can help out with this, if its something we really want to see in > 0.3.0. Lemme know. > > > completely sure that such a loose typing system will greatly benefit > > > us when working with TEXT/STRING types, however, the gzipped blobs > > > might benefit from less disk usage thanks to being stored in a single > > > file, in addition, I know that incremental i/o is a possibility with > > > blobs in sqlite 3.4, which could potentially be utilized to optimize > > > work like this. > > > > > > Anyways, please send a patch to the list if thats not too much to ask, > > > or just give us an update as to how things are going. > > > > I and Arun had some discussion about this and we were trying to balance > > the performance and size issues. He already has the sqlite-idea > > implemented; however I would also like to see how a hybrid idea works > > i.e. store the huge number of extremely small files in sqlite and store > > the really large ones on the disk. Implementing this is tricky. I just checked in some changes implementing the above hybrid idea. Currently, any file less than 4K gzipped is "an extremely small file" (stored in db) and anything more is "a really large one" (stored on disk). The cutoff is hardcoded in TextCache.cs/BLOB_SIZE_LIMIT The number of files and the disk size of .beagle/TextCache reduces significantly. Performance and memory should not suffer noticably unless I did something stupid. One thing I forgot to test was support for sqlite-2. Could anyone with sqlite-2 sync svn trunk and see if things work as expected ? .beagle/ might need to be deleted and files/emails re-indexed. In the past, I emailed how this feature relates to language determination. It still does but that would require some more work (hint: somehow merge TextCacheWriteStream and PullingReader) and a significant bit of testing. I have no plans on working on it now. - dBera -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
On Tuesday 02 October 2007 19:13, you wrote: > Thinking quickly, one way to do this would be to add an option to > query to specify the language. That's a nice option, but the default should be to search all languages I think. People are used to just type a word without setting another option. Regards Daniel -- http://www.danielnaber.de ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
> > (*) One of my recent efforts has been to add language detection support > > (based on a patch in bugzilla). > > Could you describe how this is going to work? I see that language detection > is quite simply if you have enough text, but basically impossible for > short texts like queries. So will the queries be sent through all > analyzers and then OR'ed, for example? Bummer! Didnt think about that :( The bugzilla contributor and myself were more focussed on how to detect the language. Thinking quickly, one way to do this would be to add an option to query to specify the language. Then that analyzer will be used. People who requested this feature most wanted someway to query only, say german, documents. So they know apriori what langage docs they want to query. Then they can simply specify their choice (somehow, using the Query API) and we search only in documents of that language (as well as use the right analyzer). - dBera -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
On Tuesday 02 October 2007 06:24, Debajyoti Bera wrote: > (*) One of my recent efforts has been to add language detection support > (based on a patch in bugzilla). Could you describe how this is going to work? I see that language detection is quite simply if you have enough text, but basically impossible for short texts like queries. So will the queries be sent through all analyzers and then OR'ed, for example? Regards Daniel -- http://www.danielnaber.de ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
Updated patch attached -- some of the older code was not building. Cheers, Arun On 02/10/2007, Arun Raghavan <[EMAIL PROTECTED]> wrote: > On 02/10/2007, Kevin Kubasik <[EMAIL PROTECTED]> wrote: > > Very cool, and good to hear. If Arun could share a patch for his > > implementation, that would be awesome in terms of preventing wheel > > reinvention ;) If Arun is unable, or doesn't have the time to look > > into a hybrid solution, I wouldn't mind doing some investigative work, > > I've been completely swamped with work here in the first half of the > semester, and I spent a little time getting the xesam-adaptor updated > to the latest spec. Do let me know if you're taking this up, so > there's no duplication of effort. The patch against r4013 is attached. > > > I think the biggest decision comes when its time to determine what > > our cutoff is, (size wise). While there is a little extra complication > > introduced by a hybrid system, I don't see it being a major issue to > > implement. My thought would just be to have a table in the > > TextCache.db which denotes if a uri is stored in db or on disk. The > > major concern is the cost of 2 sqlite queries per cache item. > > Might it not be easier to have a boolean field denoting whether the > field is an on-disk URI or the blob itself? Or better, if this is > possible, to just examine the first few bytes to see if they are some > ASCII text (or !(the Zip magic bytes)) > > Best, > -- > Arun Raghavan > (http://nemesis.accosted.net) > v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056 > e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com Index: beagled/FileSystemQueryable/FileSystemQueryable.cs === --- beagled/FileSystemQueryable/FileSystemQueryable.cs (revision 4016) +++ beagled/FileSystemQueryable/FileSystemQueryable.cs (working copy) @@ -1810,17 +1810,12 @@ // is stored in a property. Uri uri = UriFu.EscapedStringToUri (hit ["beagle:InternalUri"]); - string path = TextCache.UserCache.LookupPathRaw (uri); + Stream text = TextCache.UserCache.LookupText(uri, hit.Uri.LocalPath); - if (path == null) + if (text == null) return null; - // If this is self-cached, use the remapped Uri - if (path == TextCache.SELF_CACHE_TAG) - return SnippetFu.GetSnippetFromFile (query_terms, hit.Uri.LocalPath, full_text); - - path = Path.Combine (TextCache.UserCache.TextCacheDir, path); - return SnippetFu.GetSnippetFromTextCache (query_terms, path, full_text); + return SnippetFu.GetSnippet(query_terms, new StreamReader(text), full_text); } override public void Start () Index: beagled/TextCache.cs === --- beagled/TextCache.cs(revision 4016) +++ beagled/TextCache.cs(working copy) @@ -37,6 +37,53 @@ namespace Beagle.Daemon { + // We only have this class because GZipOutputStream doesn't let us + // retrieve the baseStream + public class TextCacheStream : GZipOutputStream { + private Stream stream; + + public Stream BaseStream { + get { return stream; } + } + + public TextCacheStream() : this(new MemoryStream()) + { + } + + public TextCacheStream(Stream stream) : base(stream) + { + this.stream = stream; + this.IsStreamOwner = false; + } + } + + public class TextCacheWriter : StreamWriter { + private Uri uri; + private TextCache parent_cache; + private TextCacheStream tcStream; + + public TextCacheWriter(TextCache cache, Uri uri, TextCacheStream tcStream) : base(tcStream) + { + parent_cache = cache; + this.uri = uri; + this.tcStream = tcStream; + } + + override public void Close() + { + base.Close(); + + Stream stream = tcStream.BaseStream; + + byte[] text = new byte[stream.Length]; + stream.Seek(0, SeekOrigin.Begin); + stream.Read(text, 0, (int)stream.Length); + + parent_cache.Insert(uri, text); + tcStream.BaseStream.Close(); + } + } + // FIXME: This class isn't multithread safe! This class does not // ensure that different threads don't utilize a transaction started // in a certain thread at the same t
Re: GSoC Weekly Report
On 02/10/2007, Kevin Kubasik <[EMAIL PROTECTED]> wrote: > A quick followup, some reading here: > > http://www.sqlite.org/datatype3.html > > provides some insight into how exactly sqlite3 stores values, I'm not > completely sure that such a loose typing system will greatly benefit > us when working with TEXT/STRING types, however, the gzipped blobs > might benefit from less disk usage thanks to being stored in a single > file, in addition, I know that incremental i/o is a possibility with > blobs in sqlite 3.4, which could potentially be utilized to optimize > work like this. If the bindings wrap a Stream around this, this would be ideal. There doesn't seem to be much documentation on the new bindings. From what I can see in the mono-1.2.5.1 code, the new bindings (like the old bindings) just return the entire contents of the field. Maybe we should make a feature request? -- Arun Raghavan (http://nemesis.accosted.net) v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056 e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
On 02/10/2007, Kevin Kubasik <[EMAIL PROTECTED]> wrote: > Very cool, and good to hear. If Arun could share a patch for his > implementation, that would be awesome in terms of preventing wheel > reinvention ;) If Arun is unable, or doesn't have the time to look > into a hybrid solution, I wouldn't mind doing some investigative work, I've been completely swamped with work here in the first half of the semester, and I spent a little time getting the xesam-adaptor updated to the latest spec. Do let me know if you're taking this up, so there's no duplication of effort. The patch against r4013 is attached. > I think the biggest decision comes when its time to determine what > our cutoff is, (size wise). While there is a little extra complication > introduced by a hybrid system, I don't see it being a major issue to > implement. My thought would just be to have a table in the > TextCache.db which denotes if a uri is stored in db or on disk. The > major concern is the cost of 2 sqlite queries per cache item. Might it not be easier to have a boolean field denoting whether the field is an on-disk URI or the blob itself? Or better, if this is possible, to just examine the first few bytes to see if they are some ASCII text (or !(the Zip magic bytes)) Best, -- Arun Raghavan (http://nemesis.accosted.net) v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056 e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com Index: beagled/FileSystemQueryable/FileSystemQueryable.cs === --- beagled/FileSystemQueryable/FileSystemQueryable.cs (revision 4013) +++ beagled/FileSystemQueryable/FileSystemQueryable.cs (working copy) @@ -1810,17 +1810,12 @@ // is stored in a property. Uri uri = UriFu.EscapedStringToUri (hit ["beagle:InternalUri"]); - string path = TextCache.UserCache.LookupPathRaw (uri); + Stream text = TextCache.UserCache.LookupText(uri, hit.Uri.LocalPath); - if (path == null) + if (text == null) return null; - // If this is self-cached, use the remapped Uri - if (path == TextCache.SELF_CACHE_TAG) - return SnippetFu.GetSnippetFromFile (query_terms, hit.Uri.LocalPath, full_text); - - path = Path.Combine (TextCache.UserCache.TextCacheDir, path); - return SnippetFu.GetSnippetFromTextCache (query_terms, path, full_text); + return SnippetFu.GetSnippet(query_terms, new StreamReader(text), full_text); } override public void Start () Index: beagled/TextCache.cs === --- beagled/TextCache.cs(revision 4013) +++ beagled/TextCache.cs(working copy) @@ -37,6 +37,53 @@ namespace Beagle.Daemon { + // We only have this class because GZipOutputStream doesn't let us + // retrieve the baseStream + public class TextCacheStream : GZipOutputStream { + private Stream stream; + + public Stream BaseStream { + get { return stream; } + } + + public TextCacheStream() : this(new MemoryStream()) + { + } + + public TextCacheStream(Stream stream) : base(stream) + { + this.stream = stream; + this.IsStreamOwner = false; + } + } + + public class TextCacheWriter : StreamWriter { + private Uri uri; + private TextCache parent_cache; + private TextCacheStream tcStream; + + public TextCacheWriter(TextCache cache, Uri uri, TextCacheStream tcStream) : base(tcStream) + { + parent_cache = cache; + this.uri = uri; + this.tcStream = tcStream; + } + + override public void Close() + { + base.Close(); + + Stream stream = tcStream.BaseStream; + + byte[] text = new byte[stream.Length]; + stream.Seek(0, SeekOrigin.Begin); + stream.Read(text, 0, (int)stream.Length); + + parent_cache.Insert(uri, text); + tcStream.BaseStream.Close(); + } + } + // FIXME: This class isn't multithread safe! This class does not // ensure that different threads don't utilize a transaction started // in a certain thread at the same time. However, since all the @@ -50,7 +97,7 @@ static public bool Debug = false; - public const string SELF_CACHE_TAG = "*self*"; + private con
Re: GSoC Weekly Report
> Just my thoughts on the subject. DBera: are you saying that you want > to just work/look into the language stemming, or both the language > stemming and the text cache? Depending on what you want to work on, I > can help out with this, if its something we really want to see in > 0.3.0. Lemme know. 1. I definitely don't have the time, lest it would have been done by now :) 2. I will locate Arun's patch and send it out; its a good implementation and can acts a reference. 3. The problem is less on the number of queries. It is more about sending the data to textcache (which can either store it gzipped in sqlite or gzipped on disk), and to the language determination class and to lucene without (repeat:without) storing all the data in a huge store/string in memory. I thought a cutoff size of disk_block_size would be a good starting point, it will reduce external fragmentation to a good degree since most textcache files are less than 1 block. So the decision to store on disk or in sqlite can only come after we have read, say 4KB of data. The language determination, I think, requires 1K of text. In our filter/lucene interface, lucene asks for data and then filters go and extract little more data from the file and send it back: this goes in loop till there is no more data to extract. There is no storing of data in the memory! So to do the whole thing correctly, as lucene asks for more data the filters return the data and transparently someone in the middle decides whether to store the data in sqlite or disk (and does so); furthermore, even before lucene asks for data, about 1K of data is extracted from the file, language detected and appropriate stemmer hooked and the data is kept around till lucene asks for it. The obvious approach is by extracting all the data in advance, storing it in memory, deciding where to store textcache, deciding the language and then comfortably feeding lucene from the stored data. Thats not desired. I hope you also see where the connection between language determination and text-cache comes in. Go for them if you or anyone wants to. Just let others know so there is no duplication in effort. N. Lets not target a release and cram features in :) Instead if you want to work on something, work on it. If it is done and release-ready by 0.3, it will be included. Otherwise there is always another release. There is little sense if including lots of half-complete, pooly implemented features just to make the release notes look yummy :-) Of course I am restating the obvious. (*) - dBera (*) When I sent out a to-come feature list in one of my earlier emails, I was more stressing the fact that testing is becoming very important and difficult with all these different features and less on the fact that "Wow! Now we can do XXX too". Now I think I was misread. -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
Very cool, and good to hear. If Arun could share a patch for his implementation, that would be awesome in terms of preventing wheel reinvention ;) If Arun is unable, or doesn't have the time to look into a hybrid solution, I wouldn't mind doing some investigative work, I think the biggest decision comes when its time to determine what our cutoff is, (size wise). While there is a little extra complication introduced by a hybrid system, I don't see it being a major issue to implement. My thought would just be to have a table in the TextCache.db which denotes if a uri is stored in db or on disk. The major concern is the cost of 2 sqlite queries per cache item. Just my thoughts on the subject. DBera: are you saying that you want to just work/look into the language stemming, or both the language stemming and the text cache? Depending on what you want to work on, I can help out with this, if its something we really want to see in 0.3.0. Lemme know. Cheers, Kevin Kubasik On 10/2/07, Debajyoti Bera <[EMAIL PROTECTED]> wrote: > > completely sure that such a loose typing system will greatly benefit > > us when working with TEXT/STRING types, however, the gzipped blobs > > might benefit from less disk usage thanks to being stored in a single > > file, in addition, I know that incremental i/o is a possibility with > > blobs in sqlite 3.4, which could potentially be utilized to optimize > > work like this. > > > > Anyways, please send a patch to the list if thats not too much to ask, > > or just give us an update as to how things are going. > > I and Arun had some discussion about this and we were trying to balance the > performance and size issues. He already has the sqlite-idea implemented; > however I would also like to see how a hybrid idea works i.e. store the huge > number of extremely small files in sqlite and store the really large ones on > the disk. Implementing this is tricky (*). > > - dBera > > (*) One of my recent efforts has been to add language detection support (based > on a patch in bugzilla). This will enable us to use the right stemmers and > analyzers depending on the language. The hard part is stealing some initial > text for language detection and doing it in a transparent way. Incidentally, > one implementation of the hybird approach mentioned above and the language > detection crosses path. I am waiting for some free time to get going after > them. > > -- > - > Debajyoti Bera @ http://dtecht.blogspot.com > beagle / KDE fan > Mandriva / Inspiron-1100 user > -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
> completely sure that such a loose typing system will greatly benefit > us when working with TEXT/STRING types, however, the gzipped blobs > might benefit from less disk usage thanks to being stored in a single > file, in addition, I know that incremental i/o is a possibility with > blobs in sqlite 3.4, which could potentially be utilized to optimize > work like this. > > Anyways, please send a patch to the list if thats not too much to ask, > or just give us an update as to how things are going. I and Arun had some discussion about this and we were trying to balance the performance and size issues. He already has the sqlite-idea implemented; however I would also like to see how a hybrid idea works i.e. store the huge number of extremely small files in sqlite and store the really large ones on the disk. Implementing this is tricky (*). - dBera (*) One of my recent efforts has been to add language detection support (based on a patch in bugzilla). This will enable us to use the right stemmers and analyzers depending on the language. The hard part is stealing some initial text for language detection and doing it in a transparent way. Incidentally, one implementation of the hybird approach mentioned above and the language detection crosses path. I am waiting for some free time to get going after them. -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
A quick followup, some reading here: http://www.sqlite.org/datatype3.html provides some insight into how exactly sqlite3 stores values, I'm not completely sure that such a loose typing system will greatly benefit us when working with TEXT/STRING types, however, the gzipped blobs might benefit from less disk usage thanks to being stored in a single file, in addition, I know that incremental i/o is a possibility with blobs in sqlite 3.4, which could potentially be utilized to optimize work like this. Anyways, please send a patch to the list if thats not too much to ask, or just give us an update as to how things are going. Cheers, Kevin Kubasik On 10/1/07, Kevin Kubasik <[EMAIL PROTECTED]> wrote: > On 8/19/07, Arun Raghavan <[EMAIL PROTECTED]> wrote: > > Hello All, > > This week I've been working on the new TextCache implementation that > > I'd mentioned the last time (replacing the bunch of files with an > > Sqlite db). > > > > Making an Sqlite db with just the uri and raw text caused an almost 3x > > increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in > > my test case). This despite the fact that the size of the raw text was > > only 7.9 MB. I need to figure out why this happens. In the mean time, > > I also implemented another version of this which stores (uri, gzipped > > text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly, > > this actually seems to work very well (the db for the test case > > mentioned shrunk down to 2.6 MB, which is just a little more than the > > actual size of the compressed data itself). > My first impression on this is that Sqlite is probably building an > index for the raw text data. where as the compressed data is simply > treated as a binary 'glob'. I'm not 100% sure of the table definitions > that your using, or exactly how much (in terms of Indexes) sqlite does > automatically, but that seems like the most likely culprit. As we > already have our own system for searching text ;) if you could find a > way to force sqlite to not index the table's raw text column, you > could probably get more sane numbers regarding the database size. > However, its possible, its just how sqlite handles text content, and > the gzipped text is the best way to go. The other thing to test is how > this is handled in far larger situations. Is it possible that the > first 1000 rows are very expensive, but when we scale to 5 rows, > we see only a minute increase in size? > > > > > Performance numbers on a search which returns 1205 results are below. > > I basically ran the measurements twice -- once after flushing the > > inode, dentry and page cache, and another time taking advantage of the > > disk caches. > > > > Current TextCache: > > no-disk-cache: ~1m > > with-disk-cache: ~9s > > > > New TextCache (raw and gzipped versions had similar numbers): > > no-disk-cache: ~42s > > with-disk-cache: ~10s > > > > Very cool/ interesting. One of the important cases to test here is > multiple successive queries. Think like deskbar as a user types > completion, how does such a system fair when it gets 15 or 20 queries > back to back. Does the compression difference factor in then? > > > One very important factor remains to be seen -- memory usage. I am > > working on figuring out what the impact of the new code on memory > > usage is. Numbers should be available soon. > > > > On the Xesam front, I will be updating the code tomorrow,day-after to > > reflect the latest changes to the spec. > > I know the Google SoC is over, and its completely ok if your too busy > to complete these tests, but if would be awesome if you could provide > a patch to the list so we can not only see exactly what you were > doing, but so that someone else might finish up your work and/or get > it merged in and ready for 0.3.0. > > > > -- > > Arun Raghavan > > > -- > Cheers, > Kevin Kubasik > http://kubasik.net/blog > -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
On 8/19/07, Arun Raghavan <[EMAIL PROTECTED]> wrote: > Hello All, > This week I've been working on the new TextCache implementation that > I'd mentioned the last time (replacing the bunch of files with an > Sqlite db). > > Making an Sqlite db with just the uri and raw text caused an almost 3x > increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in > my test case). This despite the fact that the size of the raw text was > only 7.9 MB. I need to figure out why this happens. In the mean time, > I also implemented another version of this which stores (uri, gzipped > text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly, > this actually seems to work very well (the db for the test case > mentioned shrunk down to 2.6 MB, which is just a little more than the > actual size of the compressed data itself). My first impression on this is that Sqlite is probably building an index for the raw text data. where as the compressed data is simply treated as a binary 'glob'. I'm not 100% sure of the table definitions that your using, or exactly how much (in terms of Indexes) sqlite does automatically, but that seems like the most likely culprit. As we already have our own system for searching text ;) if you could find a way to force sqlite to not index the table's raw text column, you could probably get more sane numbers regarding the database size. However, its possible, its just how sqlite handles text content, and the gzipped text is the best way to go. The other thing to test is how this is handled in far larger situations. Is it possible that the first 1000 rows are very expensive, but when we scale to 5 rows, we see only a minute increase in size? > > Performance numbers on a search which returns 1205 results are below. > I basically ran the measurements twice -- once after flushing the > inode, dentry and page cache, and another time taking advantage of the > disk caches. > > Current TextCache: > no-disk-cache: ~1m > with-disk-cache: ~9s > > New TextCache (raw and gzipped versions had similar numbers): > no-disk-cache: ~42s > with-disk-cache: ~10s > Very cool/ interesting. One of the important cases to test here is multiple successive queries. Think like deskbar as a user types completion, how does such a system fair when it gets 15 or 20 queries back to back. Does the compression difference factor in then? > One very important factor remains to be seen -- memory usage. I am > working on figuring out what the impact of the new code on memory > usage is. Numbers should be available soon. > > On the Xesam front, I will be updating the code tomorrow,day-after to > reflect the latest changes to the spec. I know the Google SoC is over, and its completely ok if your too busy to complete these tests, but if would be awesome if you could provide a patch to the list so we can not only see exactly what you were doing, but so that someone else might finish up your work and/or get it merged in and ready for 0.3.0. > -- > Arun Raghavan -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
> Making an Sqlite db with just the uri and raw text caused an almost 3x > increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in > my test case). This despite the fact that the size of the raw text was > only 7.9 MB. I need to figure out why this happens. In the mean time, > I also implemented another version of this which stores (uri, gzipped > text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly, > this actually seems to work very well (the db for the test case > mentioned shrunk down to 2.6 MB, which is just a little more than the > actual size of the compressed data itself). > Current TextCache: > no-disk-cache: ~1m > with-disk-cache: ~9s > > New TextCache (raw and gzipped versions had similar numbers): > no-disk-cache: ~42s > with-disk-cache: ~10s The numbers look pretty good. Size on disk is the main focus here. The disk cache will come into heavy play on a machine constantly serving queries. So even if that suffers a little bit (but only a little bit), I think its still ok if we gain in other places. The speedup with no-disk-cache is an added bonus. Do the performance degrade when looking up small result sets ? In the current implementation, that will involve lesser disk seek whereas for the sqlite based approach, the I/O overhead will probably be similar. - dBera -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
GSoC Weekly Report
Hello All, This week I've been working on the new TextCache implementation that I'd mentioned the last time (replacing the bunch of files with an Sqlite db). Making an Sqlite db with just the uri and raw text caused an almost 3x increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in my test case). This despite the fact that the size of the raw text was only 7.9 MB. I need to figure out why this happens. In the mean time, I also implemented another version of this which stores (uri, gzipped text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly, this actually seems to work very well (the db for the test case mentioned shrunk down to 2.6 MB, which is just a little more than the actual size of the compressed data itself). Performance numbers on a search which returns 1205 results are below. I basically ran the measurements twice -- once after flushing the inode, dentry and page cache, and another time taking advantage of the disk caches. Current TextCache: no-disk-cache: ~1m with-disk-cache: ~9s New TextCache (raw and gzipped versions had similar numbers): no-disk-cache: ~42s with-disk-cache: ~10s One very important factor remains to be seen -- memory usage. I am working on figuring out what the impact of the new code on memory usage is. Numbers should be available soon. On the Xesam front, I will be updating the code tomorrow,day-after to reflect the latest changes to the spec. -- Arun Raghavan (http://nemesis.accosted.net) v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056 e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
GSoC Weekly Report
Hello All, Sorry about the super-late weekly report ... it's been a crazy and hectic weekend. On the Xesam front, once the proposals up that are for review are finalized I can go about implementing them. As I'd mentioned last time, I was looking at improving the disk usage of the TextCache. Currently the TextCache maintains an Sqlite DB with the uri of the file and a pointer to the gzipped text from the file. I've implemented an alternative TextCache which stores the uri and the text itself. I was hoping to have numbers from a comparison of the two, but unfortunately wasn't able to complete this. I have to write a small tool to migrate the old TextCache to the new one so they're both the same, and then try doing a large number of fetches to compare performance. That's all for now, folks. -- Arun Raghavan (http://nemesis.accosted.net) v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056 e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
GSoc weekly report (Browser Extension Rewrite)
Hi, It's time for weekly report again. Last week. for firefox extension. I * improve firefox's bookmark index * modified the preference dialog ( did a little simplify) * find and fixed a few other bugs ( redirect problem , no-html file problem) And the firefox extension is almost finished . for epiphany extension, I basically work on config-file parse / generate and the index-link feature. What to do next: for firefox extension: code clean-up (remove debug information, check the spelling and words. ) , more test and document . for epiphany extension: add i18n support (using gettext) I think I can finish them in next week . Thanks. -- Tao Fei (陶飞) My Blog: blog.filia.cn My Summer Of Code Blog: filiasoc.blogspot.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoc weekly report (Browser Extension Rewrite)
Tao, I was testing the extension, when I noticed this (browser.dump enabled): [beagle] [beaglPref.get beagle.bookmark.active] [Exception... "Component returned failure code: 0x8000 (NS_ERROR_UNEXPECTED) [nsIPrefBranch.getBoolPref]" nsresult: "0x8000 (NS_ERROR_UNEXPECTED)" location: "JS frame :: chrome://newbeagle/content/utils.js :: anonymous :: line 53" data: no] This was getting thrown on the terminal multiple times. Not quite sure what was triggering this. (I didnt set any option explicitly in the preferences) Also, in beagleoverlay.js:writeMetadata, uniddexed should be unindexed (typo). Could you store the URLs as text and not keyword - people should be able to query part of the url too :) ? -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoc weekly report (Browser Extension Rewrite)
2007/8/7, Joe Shaw <[EMAIL PROTECTED]>: > Hi, > > On 8/6/07, Tao Fei <[EMAIL PROTECTED]> wrote: > > 2007/8/7, Joe Shaw <[EMAIL PROTECTED]>: > > > I've been playing around with the new extension, and I'm seeing a > > > little inconsistent behavior with it. I wonder if it's related to me > > > having the old Beagle extension installed as well (although I disabled > > > that one). > > > > Yes. That's the problem. > > I use the same preference name "beagle.enabled" with the old extension. > > Fixed now. Using "beagle.autoindex.active" now, the tooltip is also updated. > > "beagle.enabled" is wrong word, as it doesn't affect on-demand index. > > Cool, I'll give it a test later today. > > We should keep in mind a migration path for the old extension. > Ideally the new one will just be a drop-in replacement, and if we > could migrate the basic settings (ie, enabled/disabled and a > whitelist/blacklist) that would be ideal. Oh. You can import the preference from old extension. ( just open the preference window, and you can see the button). May be I should imported them silently when the new extension is installed. >We'll may also want to use > the same UUID so that upgrades are done cleanly, if there's no method > for obsoleting other extensions. The same UUID ? I guess we only need to modified the install.rdf to change the UUID. -- Tao Fei (陶飞) My Blog: blog.filia.cn My Summer Of Code Blog: filiasoc.blogspot.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoc weekly report (Browser Extension Rewrite)
Hi, On 8/6/07, Tao Fei <[EMAIL PROTECTED]> wrote: > 2007/8/7, Joe Shaw <[EMAIL PROTECTED]>: > > I've been playing around with the new extension, and I'm seeing a > > little inconsistent behavior with it. I wonder if it's related to me > > having the old Beagle extension installed as well (although I disabled > > that one). > > Yes. That's the problem. > I use the same preference name "beagle.enabled" with the old extension. > Fixed now. Using "beagle.autoindex.active" now, the tooltip is also updated. > "beagle.enabled" is wrong word, as it doesn't affect on-demand index. Cool, I'll give it a test later today. We should keep in mind a migration path for the old extension. Ideally the new one will just be a drop-in replacement, and if we could migrate the basic settings (ie, enabled/disabled and a whitelist/blacklist) that would be ideal. We'll may also want to use the same UUID so that upgrades are done cleanly, if there's no method for obsoleting other extensions. Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoc weekly report (Browser Extension Rewrite)
2007/8/7, Joe Shaw <[EMAIL PROTECTED]>: > Hey, > > I've been playing around with the new extension, and I'm seeing a > little inconsistent behavior with it. I wonder if it's related to me > having the old Beagle extension installed as well (although I disabled > that one). Yes. That's the problem. I use the same preference name "beagle.enabled" with the old extension. Fixed now. Using "beagle.autoindex.active" now, the tooltip is also updated. "beagle.enabled" is wrong word, as it doesn't affect on-demand index. > Whenever I open a site, I get the little dog icon with an X over it, > indicating that it's not indexing that page. The page is not from > HTTPS, and when I open the preferences dialog, the "Default Action" > has "Index" selected. If I click on the icon to toggle it, it gets > indexed fine. But I'm not sure why it's not by default. After I > toggle the icon, any subsequent page opens are indexed. > > Joe > -- Tao Fei (陶飞) My Blog: blog.filia.cn My Summer Of Code Blog: filiasoc.blogspot.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoc weekly report (Browser Extension Rewrite)
Hey, I've been playing around with the new extension, and I'm seeing a little inconsistent behavior with it. I wonder if it's related to me having the old Beagle extension installed as well (although I disabled that one). Whenever I open a site, I get the little dog icon with an X over it, indicating that it's not indexing that page. The page is not from HTTPS, and when I open the preferences dialog, the "Default Action" has "Index" selected. If I click on the icon to toggle it, it gets indexed fine. But I'm not sure why it's not by default. After I toggle the icon, any subsequent page opens are indexed. Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
GSoc weekly report (Browser Extension Rewrite)
Hi, It's time for weekly report again. Last week, I work on both firefox and epiphany extension. For firefox: (Thanks for Debajyoti's suggestion) * add referrer url to meta file * index bookmarks (on-demand index or index when close) For Epiphany (now in python): * config file support * basic menu items (index this page , toggle auto-index) * status-bar label And some document. What to do next : * improve firefox's bookmarks index * epiphany extension improvement ( any suggestion?) * solve some usability problem * document You can follow http://beagle-project.org/Browser_Extension 's instructions to install and test the extension, and give me some feedback. Thanks. -- Tao Fei (陶飞) My Blog: blog.filia.cn My Summer Of Code Blog: filiasoc.blogspot.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
GSoc weekly report (Browser Extension Rewrite)
Hi, All This has been a relatively slow week. I've basically been looking at the epiphany extension * read more document , read the epiphany source code and some example extension code * write a small script in python ( I take it as an "experiment" ) http://browser-extension-for-beagle.googlecode.com/svn/trunk/py-epiphany-extension/ The main target of next week, rewirte the epiphany extension ( in C ) * save file to ~/.beagle/ToIndex instead of calling beagle-index-url * add basic config ( index HTTPS or not , exclude /include rule) * UI (It may have some problems as I am not familiar with GTK) Thanks. -- Tao Fei (陶飞) My Blog: blog.filia.cn My Summer Of Code Blog: filiasoc.blogspot.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
GSoc Weekly Report (Browser Extension Rewrite)
Hi, all. Sorry for a little delay. In last week, I kept on improving the firefox extension. * port the existed code for search (this page, link, selected text) to the new extension * fix some bug about index-link ( now can handle the non-html file in proper way) * modify the preferences related codes (give a proper interface) * add a function to import the old extension's preferences (the security filters) And finally I manage to pack it up, and do more test. You may to try it out . Just download it from http://browser-extension-for-beagle.googlecode.com/files/[EMAIL PROTECTED] It may still have some problems, so if you find any bug please let me know. For epiphany extension , I read some documents and the existed code in last week. I will begin coding on it this week. what to do next week * more test/document for firefox extension * epiphany extension develop. (just a list , I've no idea about how many of them can be done in the next week) save file to .beagle/ToIndex/ instead of call beagle-index-url add config related code GUI ? Thanks. -- Tao Fei (陶飞) My Blog: blog.filia.cn My Summer Of Code Blog: filiasoc.blogspot.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
RE: GSoc Weekly Report (Browser Extension Rewrite)
Hey, a quick note on the subject, I made a haphazard attempt at this rewrite some time ago, and faced the same issue you have now. I think the deciding factor would be your personal experience with the languages. If you have never really worked with C, but have used python, I would think that a well designed and well written python plugin is much better than a haphazard 'My First C' program. The second concern/thought is that a lot of users will leave their browsers open for hours (if not days) at a time, I'm not 100% sure if this applies in the plugin context, but a GC system probably offers some safety net for memory use. Just a quick $0.02, Kevin Kubasik -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Joe Shaw Sent: Friday, July 20, 2007 10:49 AM To: Tao Fei Cc: dashboard-hackers@gnome.org Subject: Re: GSoc Weekly Report (Browser Extension Rewrite) Hi, On 7/14/07, Tao Fei <[EMAIL PROTECTED]> wrote: > I've noticed that Epiphany can be written in C or in Python. The old > extension is written in C. I'm wondering whether it is acceptable if I > write the extension in python ? It's a possibility, although I'm not crazy about adding a Python dependency to Beagle (not libbeagle, which already has an optional Python dep for the bindings). It's probably not unreasonable to assume that anyone with Epiphany installed will also have Python, however. Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers smime.p7s Description: S/MIME cryptographic signature ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoc Weekly Report (Browser Extension Rewrite)
Hi, On 7/14/07, Tao Fei <[EMAIL PROTECTED]> wrote: > I've noticed that Epiphany can be written in C or in Python. The old > extension is written in C. I'm wondering whether it is acceptable if I > write the extension in python ? It's a possibility, although I'm not crazy about adding a Python dependency to Beagle (not libbeagle, which already has an optional Python dep for the bindings). It's probably not unreasonable to assume that anyone with Epiphany installed will also have Python, however. Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
GSoc Weekly Report (Browser Extension Rewrite)
Hi, all. What I have done in last week , *UI improvement content menu (for status icon and content Area ) / toolbar icon *quick add exclude / include rule *index link (the target can be non-html file) *read some document about Epiphany extension develop What to do next: * more UI improvement for firefox extension * code cleanup , documenting * set up the epiphany extension develop environment and try something I've noticed that Epiphany can be written in C or in Python. The old extension is written in C. I'm wondering whether it is acceptable if I write the extension in python ? Thanks. -- Tao Fei (陶飞) My Blog: blog.filia.cn My Summer Of Code Blog: filiasoc.blogspot.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoc Weekly Report (Browser Extension Rewrite)
Sorry for late. I have just got back to home. I got some network problem, and failed to get access to the network until today, There was a recent one opened against the old extension about internationalization. I think that's a pretty important task that this one should address. There is even a patch attached to that bug, although I haven't looked at it closely. Yes,I have notice that. (Debajyoti have cc-ed this bug to me) I'd like to say that the new extension will be "translatable". I have put all the UI string in .dtd file and all the javascript string in .properties file (expect some debug information). And I will keep doing that. Thanks. -- Tao Fei (陶飞) My Blog: blog.filia.cn My Summer Of Code Blog: filiasoc.blogspot.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoc Weekly Report (Browser Extension Rewrite)
Hi, On 7/6/07, Tao Fei <[EMAIL PROTECTED]> wrote: > I did a little search in http://bugzilla.gnome.org/ , there are some > bug reports for the extension. > eg, Bug 317605: http://bugzilla.gnome.org/show_bug.cgi?id=317605 > In fact , I use "status bar label " to indicate whether the page is > indexed. And use the beagle icon to indicate wheather the beagle is > enabled or disabled or in a error state. > The icon is "global". I think it partly fix the bug . Yeah, I think this is a good idea. I didn't like before that the icon was overloaded for two questions: is this page indexed? and is the extension enabled for this page? Separating those concepts is a good idea. > What to do next: > * to fix the bugs in bugzilla (or avoid producing them ) There was a recent one opened against the old extension about internationalization. I think that's a pretty important task that this one should address. There is even a patch attached to that bug, although I haven't looked at it closely. Thanks, Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
GSoc Weekly Report (Browser Extension Rewrite)
Hi, all. In the last week (or last two weeks) , I have improve the extension a little. * popup menu * "index this page " index current page , ignore the exclude /include rules. * Change the status bar label to "beagle is indexing [URL]" when a page is indexed. I did a little search in http://bugzilla.gnome.org/ , there are some bug reports for the extension. eg, Bug 317605: http://bugzilla.gnome.org/show_bug.cgi?id=317605 In fact , I use "status bar label " to indicate whether the page is indexed. And use the beagle icon to indicate wheather the beagle is enabled or disabled or in a error state. The icon is "global". I think it partly fix the bug . What to do next: * to fix the bugs in bugzilla (or avoid producing them ) * make it more convient to add filters ( something like add "alway index this domain/page" "never index this domain/page " menu ) Thanks. -- Tao Fei (陶飞) My Blog: blog.filia.cn My Summer Of Code Blog: filiasoc.blogspot.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
GSoc Weekly Report (Browser Extension Rewrite)
Hi, Last week, I * bug fix * test for index (exclude / include filter , status switch ..) * finish the first run-able version . Goals for next week: * remove the jslib related code ( The lib is too big) * add context menu * (if have time ) add "index it now " button , add "filter quick add " button. Thanks --- TaoFei (陶飞) ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
GSoc weekly report (Browser Extension Rewrite)
Hi all. Last week, I * move all the literal string in code to beagle.properties , and some help function for i18n * index html page (not enough test) I will have some examination in the next two weeks, so Goals for next week: * more test * remove the jslib realted code ( The lib is too big) * add context menu Thanks. -- Tao Fei (陶飞) My Blog: blog.filia.cn My Summer Of Code Blog: filiasoc.blogspot.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
GSoc weekly report (Browser Extension Rewrite)
Hi all. Last week, I * modified the preference code ( remove global vars , documents comments) * some utils function (string operation , wildcard to RE,etc) * filter ( check weather a url should be indexed based on the exclude/include rule) * upload the code to code.google.com (http://code.google.com/p/browser-extension-for-beagle/source) Next week: * do index * some test / log / debug code * i18n (move all the literal string in code to beagle.properties. Thanks. -- Tao Fei (陶飞) My Blog: blog.filia.cn My Summer Of Code Blog: filiasoc.blogspot.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
GSoC Weekly Report (Thundebird backend rewrite)
Hey! Time for my first weekly summer of code report. Here's a summary of what I've been working on so far: * Reviewed my old Thunderbird code * Cleared some question marks by studying some of the Thunderbird source code * Begun the work on the core framework that I will base the backend upon (based on the review of my old code and Thunderbird source code study) * Made some regression tests * Written documentation My ambitions for next week: * Work on the framework and continue documenting it * Write more regression tests By "core framework" above, I mean a draft of all classes that I will need to implement the backend, the entire structure of these (methods, properties etc.) and the behavior of the each class. The core part (which will later live in the utility part of beagle) is almost complete, I'm currently writing documentation for it. I have found the documentation part extremely useful, it lets me fix all small design flaws that I originally did not see. It will (hopefully) also make it easier for you guys to see and understand what I'm working on. Other things that might be interesting to know is that I've decided to use NUnit as my framework for writing regression tests. It contains what i need, is simple to use and integrates well with monodevelop, which is my primary IDE. I'm using monodocer to generate XML-documentation and monodoc to edit it, for the ones interested (I'm _not_ using inline XML, as a lot of people does not like this), and I will link to an HTML version of the documentation on my blog once I'm done. I hope to finish the core framework by next week so that I can publish it and hopefully get some input on it before I go on and write any code. More on this in next weeks report and on my blog. Thanks! Pierre Östlund ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers