Migrate to Mono.Data.Sqlite (Was: Re: GSoC Weekly Report)

2007-10-18 Thread Debajyoti Bera
> Ignore my previous email ... I was looking at the wrong place :(
> This is the right place for the new M.D.Sqlite
> http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/Mono.Data.Sqlite/Mo
>no.Data.Sqlite_2.0/SQLiteDataReader.cs

Migration from Mono.Data.SqliteClient to Mono.Data.Sqlite completed (rev 
4061).

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-18 Thread Debajyoti Bera
Ignore my previous email ... I was looking at the wrong place :(
This is the right place for the new M.D.Sqlite
http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/Mono.Data.Sqlite/Mono.Data.Sqlite_2.0/SQLiteDataReader.cs

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-18 Thread Debajyoti Bera
> > A followup question, I didnot find any API documentation of
> > Mono.Data.Sqlite :( #mono was also sleeping when I asked the question
> > there.
>
> My understanding is that both M.D.SqliteClient and M.D.Sqlite follow
> the general ADO.Net API patterns and that the latter is more or less a
> drop-in replacement for the former.  A few things may need to be
> tweaked, but in general just changing the "using" statements at the
> top of each source file should be all that's needed.

I was more looking for some method for row-by-row retrieval, on demand. Real 
on-demand, where the implementation does not retrieve all the rows at once 
but returns one by one.

> You've always been able to get rows on demand via ADO.Net, it's just a
> matter of the implementation underneath.  The old one (not modified by
> us) would load all of them into memory.  I'm not sure how the new one
> performs memory-wise.  If the Mono guys don't have any idea, the right

I checked the source out of curiousity
http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/Mono.Data.Sqlite/Mono.Data.Sqlite/
And the code for DataReader looks exactly the same (didnt do a diff, just 
visually) as the one in Mono.Data.SqliteClient. So even if we migrate (the 
migration would be easy), we still have to ship with a modified inhouse 
M.D.Sqlite and keep syncing in with upstream. *sigh*

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-18 Thread Joe Shaw
Hi,

On 10/16/07, D Bera <[EMAIL PROTECTED]> wrote:
> A followup question, I didnot find any API documentation of
> Mono.Data.Sqlite :( #mono was also sleeping when I asked the question
> there.

My understanding is that both M.D.SqliteClient and M.D.Sqlite follow
the general ADO.Net API patterns and that the latter is more or less a
drop-in replacement for the former.  A few things may need to be
tweaked, but in general just changing the "using" statements at the
top of each source file should be all that's needed.

> If M.D.Sqlite does not have a way to return rows on demand, I
> am against the migration. In the worst case, we can ship with a
> modified copy of M.D.Sqlite but I am not sure what will that buy us.

You've always been able to get rows on demand via ADO.Net, it's just a
matter of the implementation underneath.  The old one (not modified by
us) would load all of them into memory.  I'm not sure how the new one
performs memory-wise.  If the Mono guys don't have any idea, the right
thing to do here would be to create a large test database (or use an
existing TextCache or FAStore db) and do a "SELECT *" using the 3
implementations and walk the results, using heap-buddy and/or
heap-shot to analyze their memory usage.

> In the same breath, what is the benefit of M.D.Sqlite over
> M.D.SqliteClient for beagle ? I figured out there are some ADO.Net
> advantages but other than that ... ?

It's maintained for one, which our modified one essentially isn't.  It
has the backing of the Mono team.  The code is much cleaner and easier
to understand, largely because it doesn't have two separate codepaths
(one for v2 and one for v3).  I am sure the Mono guys have other good
reasons too. :)

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-16 Thread D Bera
> Indeed you're right, but those changes did get merged upstream.  So
> the memory usage I believe is the only outstanding reason.

Sweet.
A followup question, I didnot find any API documentation of
Mono.Data.Sqlite :( #mono was also sleeping when I asked the question
there. If M.D.Sqlite does not have a way to return rows on demand, I
am against the migration. In the worst case, we can ship with a
modified copy of M.D.Sqlite but I am not sure what will that buy us.
In the same breath, what is the benefit of M.D.Sqlite over
M.D.SqliteClient for beagle ? I figured out there are some ADO.Net
advantages but other than that ... ?

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-16 Thread Joe Shaw
Hi,

On 10/16/07, Debajyoti Bera <[EMAIL PROTECTED]> wrote:
> > > What to do with our local changes to Mono.Data.SqliteClient ? I always
> > > get confused with them. Dont even know what are those changes and why are
> > > they there :-/ (it has something to with threading and locking) ?
> >
> > The work done locally was mainly for memory usage reasons.  IIRC, the
> > upstream bindings pull all of the results into memory at once, whereas
> > our locally modified ones do so only as needed.  I don't think
> > threading/locking was ever an issue -- you might be confusing it with
> > the fact that we couldn't use early sqlite 3.x versions because of
> > broken policy in the library to that effect.
>
> Probably you are right. I still had to verify ...
> beagle:/source=mind?query=sqlite+beagle+lock
> returned nothing :-D
> but google returned,
> http://lists.ximian.com/pipermail/mono-devel-list/2005-November/015977.html
> which mentions "Lock" ... yay! My faith in my memory is restored ;-)

Indeed you're right, but those changes did get merged upstream.  So
the memory usage I believe is the only outstanding reason.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-16 Thread Debajyoti Bera
> > What to do with our local changes to Mono.Data.SqliteClient ? I always
> > get confused with them. Dont even know what are those changes and why are
> > they there :-/ (it has something to with threading and locking) ?
>
> The work done locally was mainly for memory usage reasons.  IIRC, the
> upstream bindings pull all of the results into memory at once, whereas
> our locally modified ones do so only as needed.  I don't think
> threading/locking was ever an issue -- you might be confusing it with
> the fact that we couldn't use early sqlite 3.x versions because of
> broken policy in the library to that effect.

Probably you are right. I still had to verify ...
beagle:/source=mind?query=sqlite+beagle+lock
returned nothing :-D
but google returned,
http://lists.ximian.com/pipermail/mono-devel-list/2005-November/015977.html 
which mentions "Lock" ... yay! My faith in my memory is restored ;-)

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-15 Thread Joe Shaw
Hi,

On 10/13/07, Debajyoti Bera <[EMAIL PROTECTED]> wrote:
> What to do with our local changes to Mono.Data.SqliteClient ? I always get
> confused with them. Dont even know what are those changes and why are they
> there :-/ (it has something to with threading and locking) ?

The work done locally was mainly for memory usage reasons.  IIRC, the
upstream bindings pull all of the results into memory at once, whereas
our locally modified ones do so only as needed.  I don't think
threading/locking was ever an issue -- you might be confusing it with
the fact that we couldn't use early sqlite 3.x versions because of
broken policy in the library to that effect.

I'm not sure what the memory side effects of the newer upstream bindings are.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-13 Thread Debajyoti Bera
> Sorry, I was unclear.  By "removing sqlite2" I meant simply removing
> it as an option from configure.in and requiring only sqlite3, not
> removing the codepaths from the cut-and-pasted code.  Then, at some
> point in the future, porting over to Mono's own Mono.Data.Sqlite.

What to do with our local changes to Mono.Data.SqliteClient ? I always get 
confused with them. Dont even know what are those changes and why are they 
there :-/ (it has something to with threading and locking) ?

- dBera

PS: Mannn... I love these Liberation fonts... can't stop reading the same mail 
ten times :P

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-10 Thread Joe Shaw
Hi,

On 10/9/07, D Bera <[EMAIL PROTECTED]> wrote:
> > At this point, I'm in favor of dropping support for sqlite2 entirely
> > anyway.  That will make a migration to the new Mono sqlite bindings
> > smoother, and drop a nasty chunk of cut-and-paste-and-patch code in
> > the tree.
>
> Me too, me too ...
> But I see no point in the double effort in once removing sqlite-2
> support and then changing the code to use mono.data.sqlite. Any
> volunteers for the cleanup ?

Sorry, I was unclear.  By "removing sqlite2" I meant simply removing
it as an option from configure.in and requiring only sqlite3, not
removing the codepaths from the cut-and-pasted code.  Then, at some
point in the future, porting over to Mono's own Mono.Data.Sqlite.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-09 Thread D Bera
> > One thing I forgot to test was support for sqlite-2. Could anyone with
> > sqlite-2 sync svn trunk and see if things work as expected ? .beagle/ might
> > need to be deleted and files/emails re-indexed.
>
> At this point, I'm in favor of dropping support for sqlite2 entirely
> anyway.  That will make a migration to the new Mono sqlite bindings
> smoother, and drop a nasty chunk of cut-and-paste-and-patch code in
> the tree.

Me too, me too ...
But I see no point in the double effort in once removing sqlite-2
support and then changing the code to use mono.data.sqlite. Any
volunteers for the cleanup ?

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-09 Thread Joe Shaw
Hi,

On 10/8/07, Debajyoti Bera <[EMAIL PROTECTED]> wrote:
> One thing I forgot to test was support for sqlite-2. Could anyone with
> sqlite-2 sync svn trunk and see if things work as expected ? .beagle/ might
> need to be deleted and files/emails re-indexed.

At this point, I'm in favor of dropping support for sqlite2 entirely
anyway.  That will make a migration to the new Mono sqlite bindings
smoother, and drop a nasty chunk of cut-and-paste-and-patch code in
the tree.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-08 Thread Debajyoti Bera
Hi,

First the context of this discussion: better storing of cached data (aka 
textcache).

> Very cool, and good to hear. If Arun could share a patch for his
> implementation, that would be awesome in terms of preventing wheel
> reinvention ;) If Arun is unable, or doesn't have the time to look
> into a hybrid solution, I wouldn't mind doing some investigative work,
>  I think the biggest decision comes when its time to determine what
> our cutoff is, (size wise). While there is a little extra complication
> introduced by a hybrid system, I don't see it being a major  issue to
> implement. My thought would just be to have a table in the
> TextCache.db which denotes if a uri is stored in db or on disk. The
> major concern is the cost of 2 sqlite queries per cache item.
>
> Just my thoughts on the subject. DBera: are you saying that you want
> to just work/look into the language stemming, or both the language
> stemming and the text cache? Depending on what you want to work on, I
> can help out with this, if its something we really want to see in
> 0.3.0. Lemme know.
> > > completely sure that such a loose typing system will greatly benefit
> > > us when working with TEXT/STRING types, however, the gzipped blobs
> > > might benefit from less disk usage thanks to being stored in a single
> > > file, in addition, I know that incremental i/o is a possibility with
> > > blobs in sqlite 3.4, which could potentially be utilized to optimize
> > > work like this.
> > >
> > > Anyways, please send a patch to the list if thats not too much to ask,
> > > or just give us an update as to how things are going.
> >
> > I and Arun had some discussion about this and we were trying to balance
> > the performance and size issues. He already has the sqlite-idea
> > implemented; however I would also like to see how a hybrid idea works
> > i.e. store the huge number of extremely small files in sqlite and store
> > the really large ones on the disk. Implementing this is tricky.

I just checked in some changes implementing the above hybrid idea. Currently, 
any file less than 4K gzipped is "an extremely small file" (stored in db) and 
anything more is "a really large one" (stored on disk). The cutoff is 
hardcoded in TextCache.cs/BLOB_SIZE_LIMIT The number of files and the disk 
size of .beagle/TextCache reduces significantly. Performance and memory 
should not suffer noticably unless I did something stupid.

One thing I forgot to test was support for sqlite-2. Could anyone with 
sqlite-2 sync svn trunk and see if things work as expected ? .beagle/ might 
need to be deleted and files/emails re-indexed.

In the past, I emailed how this feature relates to language determination. It 
still does but that would require some more work (hint: somehow merge 
TextCacheWriteStream and PullingReader) and a significant bit of testing. I 
have no plans on working on it now.

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread Daniel Naber
On Tuesday 02 October 2007 19:13, you wrote:

> Thinking quickly, one way to do this would be to add an option to
> query to specify the language.

That's a nice option, but the default should be to search all languages I 
think. People are used to just type a word without setting another option.

Regards
 Daniel

-- 
http://www.danielnaber.de
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread D Bera
> > (*) One of my recent efforts has been to add language detection support
> > (based on a patch in bugzilla).
>
> Could you describe how this is going to work? I see that language detection
> is quite simply if you have enough text, but basically impossible for
> short texts like queries. So will the queries be sent through all
> analyzers and then OR'ed, for example?

Bummer! Didnt think about that :(
The bugzilla contributor and myself were more focussed on how to
detect the language.

Thinking quickly, one way to do this would be to add an option to
query to specify the language. Then that analyzer will be used. People
who requested this feature most wanted someway to query only, say
german, documents. So they know apriori what langage docs they want to
query. Then they can simply specify their choice (somehow, using the
Query API) and we search only in documents of that language (as well
as use the right analyzer).

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread Daniel Naber
On Tuesday 02 October 2007 06:24, Debajyoti Bera wrote:

> (*) One of my recent efforts has been to add language detection support
> (based on a patch in bugzilla).

Could you describe how this is going to work? I see that language detection 
is quite simply if you have enough text, but basically impossible for 
short texts like queries. So will the queries be sent through all 
analyzers and then OR'ed, for example?

Regards
 Daniel

-- 
http://www.danielnaber.de
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread Arun Raghavan
Updated patch attached -- some of the older code was not building.

Cheers,
Arun

On 02/10/2007, Arun Raghavan <[EMAIL PROTECTED]> wrote:
> On 02/10/2007, Kevin Kubasik <[EMAIL PROTECTED]> wrote:
> > Very cool, and good to hear. If Arun could share a patch for his
> > implementation, that would be awesome in terms of preventing wheel
> > reinvention ;) If Arun is unable, or doesn't have the time to look
> > into a hybrid solution, I wouldn't mind doing some investigative work,
>
> I've been completely swamped with work here in the first half of the
> semester, and I spent a little time getting the xesam-adaptor updated
> to the latest spec. Do let me know if you're taking this up, so
> there's no duplication of effort. The patch against r4013 is attached.
>
> >  I think the biggest decision comes when its time to determine what
> > our cutoff is, (size wise). While there is a little extra complication
> > introduced by a hybrid system, I don't see it being a major  issue to
> > implement. My thought would just be to have a table in the
> > TextCache.db which denotes if a uri is stored in db or on disk. The
> > major concern is the cost of 2 sqlite queries per cache item.
>
> Might it not be easier to have a boolean field denoting whether the
> field is an on-disk URI or the blob itself? Or better, if this is
> possible, to just examine the first few bytes to see if they are some
> ASCII text (or !(the Zip magic bytes))
>
> Best,
> --
> Arun Raghavan
> (http://nemesis.accosted.net)
> v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056
> e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com
Index: beagled/FileSystemQueryable/FileSystemQueryable.cs
===
--- beagled/FileSystemQueryable/FileSystemQueryable.cs  (revision 4016)
+++ beagled/FileSystemQueryable/FileSystemQueryable.cs  (working copy)
@@ -1810,17 +1810,12 @@
// is stored in a property.
Uri uri = UriFu.EscapedStringToUri (hit 
["beagle:InternalUri"]);
 
-   string path = TextCache.UserCache.LookupPathRaw (uri);
+   Stream text = TextCache.UserCache.LookupText(uri, 
hit.Uri.LocalPath);
 
-   if (path == null)
+   if (text == null)
return null;
 
-   // If this is self-cached, use the remapped Uri
-   if (path == TextCache.SELF_CACHE_TAG)
-   return SnippetFu.GetSnippetFromFile 
(query_terms, hit.Uri.LocalPath, full_text);
-
-   path = Path.Combine (TextCache.UserCache.TextCacheDir, 
path);
-   return SnippetFu.GetSnippetFromTextCache (query_terms, 
path, full_text);
+   return SnippetFu.GetSnippet(query_terms, new 
StreamReader(text), full_text);
}
 
override public void Start ()
Index: beagled/TextCache.cs
===
--- beagled/TextCache.cs(revision 4016)
+++ beagled/TextCache.cs(working copy)
@@ -37,6 +37,53 @@
 
 namespace Beagle.Daemon {
 
+   // We only have this class because GZipOutputStream doesn't let us
+   // retrieve the baseStream
+   public class TextCacheStream : GZipOutputStream {
+   private Stream stream;
+
+   public Stream BaseStream {
+   get { return stream; }
+   }
+
+   public TextCacheStream() : this(new MemoryStream())
+   {
+   }
+
+   public TextCacheStream(Stream stream) : base(stream)
+   {
+   this.stream = stream;
+   this.IsStreamOwner = false;
+   }
+   }
+
+   public class TextCacheWriter : StreamWriter {
+   private Uri uri;
+   private TextCache parent_cache;
+   private TextCacheStream tcStream;
+
+   public TextCacheWriter(TextCache cache, Uri uri, 
TextCacheStream tcStream) : base(tcStream)
+   {
+   parent_cache = cache;
+   this.uri = uri;
+   this.tcStream = tcStream;
+   }
+
+   override public void Close()
+   {
+   base.Close();
+
+   Stream stream = tcStream.BaseStream;
+
+   byte[] text = new byte[stream.Length];
+   stream.Seek(0, SeekOrigin.Begin);
+   stream.Read(text, 0, (int)stream.Length);
+
+   parent_cache.Insert(uri, text);
+   tcStream.BaseStream.Close();
+   }
+   }
+
// FIXME: This class isn't multithread safe!  This class does not
// ensure that different threads don't utilize a transaction started
// in a certain thread at the same t

Re: GSoC Weekly Report

2007-10-02 Thread Arun Raghavan
On 02/10/2007, Kevin Kubasik <[EMAIL PROTECTED]> wrote:
> A quick followup, some reading here:
>
> http://www.sqlite.org/datatype3.html
>
> provides some insight into how exactly sqlite3 stores values, I'm not
> completely sure that such a loose typing system will greatly benefit
> us when working with TEXT/STRING types, however, the gzipped blobs
> might benefit from less disk usage thanks to being stored in a single
> file, in addition, I know that incremental i/o is a possibility with
> blobs in sqlite 3.4, which could potentially be utilized to optimize
> work like this.

If the bindings wrap a Stream around this, this would be ideal. There
doesn't seem to be much documentation on the new bindings. From what I
can see in the mono-1.2.5.1 code, the new bindings (like the old
bindings) just return the entire contents of the field. Maybe we
should make a feature request?
-- 
Arun Raghavan
(http://nemesis.accosted.net)
v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056
e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread Arun Raghavan
On 02/10/2007, Kevin Kubasik <[EMAIL PROTECTED]> wrote:
> Very cool, and good to hear. If Arun could share a patch for his
> implementation, that would be awesome in terms of preventing wheel
> reinvention ;) If Arun is unable, or doesn't have the time to look
> into a hybrid solution, I wouldn't mind doing some investigative work,

I've been completely swamped with work here in the first half of the
semester, and I spent a little time getting the xesam-adaptor updated
to the latest spec. Do let me know if you're taking this up, so
there's no duplication of effort. The patch against r4013 is attached.

>  I think the biggest decision comes when its time to determine what
> our cutoff is, (size wise). While there is a little extra complication
> introduced by a hybrid system, I don't see it being a major  issue to
> implement. My thought would just be to have a table in the
> TextCache.db which denotes if a uri is stored in db or on disk. The
> major concern is the cost of 2 sqlite queries per cache item.

Might it not be easier to have a boolean field denoting whether the
field is an on-disk URI or the blob itself? Or better, if this is
possible, to just examine the first few bytes to see if they are some
ASCII text (or !(the Zip magic bytes))

Best,
-- 
Arun Raghavan
(http://nemesis.accosted.net)
v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056
e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com
Index: beagled/FileSystemQueryable/FileSystemQueryable.cs
===
--- beagled/FileSystemQueryable/FileSystemQueryable.cs  (revision 4013)
+++ beagled/FileSystemQueryable/FileSystemQueryable.cs  (working copy)
@@ -1810,17 +1810,12 @@
// is stored in a property.
Uri uri = UriFu.EscapedStringToUri (hit 
["beagle:InternalUri"]);
 
-   string path = TextCache.UserCache.LookupPathRaw (uri);
+   Stream text = TextCache.UserCache.LookupText(uri, 
hit.Uri.LocalPath);
 
-   if (path == null)
+   if (text == null)
return null;
 
-   // If this is self-cached, use the remapped Uri
-   if (path == TextCache.SELF_CACHE_TAG)
-   return SnippetFu.GetSnippetFromFile 
(query_terms, hit.Uri.LocalPath, full_text);
-
-   path = Path.Combine (TextCache.UserCache.TextCacheDir, 
path);
-   return SnippetFu.GetSnippetFromTextCache (query_terms, 
path, full_text);
+   return SnippetFu.GetSnippet(query_terms, new 
StreamReader(text), full_text);
}
 
override public void Start ()
Index: beagled/TextCache.cs
===
--- beagled/TextCache.cs(revision 4013)
+++ beagled/TextCache.cs(working copy)
@@ -37,6 +37,53 @@
 
 namespace Beagle.Daemon {
 
+   // We only have this class because GZipOutputStream doesn't let us
+   // retrieve the baseStream
+   public class TextCacheStream : GZipOutputStream {
+   private Stream stream;
+
+   public Stream BaseStream {
+   get { return stream; }
+   }
+
+   public TextCacheStream() : this(new MemoryStream())
+   {
+   }
+
+   public TextCacheStream(Stream stream) : base(stream)
+   {
+   this.stream = stream;
+   this.IsStreamOwner = false;
+   }
+   }
+
+   public class TextCacheWriter : StreamWriter {
+   private Uri uri;
+   private TextCache parent_cache;
+   private TextCacheStream tcStream;
+
+   public TextCacheWriter(TextCache cache, Uri uri, 
TextCacheStream tcStream) : base(tcStream)
+   {
+   parent_cache = cache;
+   this.uri = uri;
+   this.tcStream = tcStream;
+   }
+
+   override public void Close()
+   {
+   base.Close();
+
+   Stream stream = tcStream.BaseStream;
+
+   byte[] text = new byte[stream.Length];
+   stream.Seek(0, SeekOrigin.Begin);
+   stream.Read(text, 0, (int)stream.Length);
+
+   parent_cache.Insert(uri, text);
+   tcStream.BaseStream.Close();
+   }
+   }
+
// FIXME: This class isn't multithread safe!  This class does not
// ensure that different threads don't utilize a transaction started
// in a certain thread at the same time.  However, since all the
@@ -50,7 +97,7 @@
 
static public bool Debug = false;
 
-   public const string SELF_CACHE_TAG = "*self*";
+   private con

Re: GSoC Weekly Report

2007-10-02 Thread D Bera
> Just my thoughts on the subject. DBera: are you saying that you want
> to just work/look into the language stemming, or both the language
> stemming and the text cache? Depending on what you want to work on, I
> can help out with this, if its something we really want to see in
> 0.3.0. Lemme know.

1. I definitely don't have the time, lest it would have been done by now :)
2. I will locate Arun's patch and send it out; its a good
implementation and can acts a reference.
3. The problem is less on the number of queries. It is more about
sending the data to textcache (which can either store it gzipped in
sqlite or gzipped on disk), and to the language determination class
and to lucene without (repeat:without) storing all the data in a huge
store/string in memory. I thought a cutoff size of disk_block_size
would be a good starting point, it will reduce external fragmentation
to a good degree since most textcache files are less than 1 block. So
the decision to store on disk or in sqlite can only come after we have
read, say 4KB of data. The language determination, I think, requires
1K of text. In our filter/lucene interface, lucene asks for data and
then filters go and extract little more data from the file and send it
back: this goes in loop till there is no more data to extract. There
is no storing of data in the memory! So to do the whole thing
correctly, as lucene asks for more data the filters return the data
and transparently someone in the middle decides whether to store the
data in sqlite or disk (and does so); furthermore, even before lucene
asks for data, about 1K of data is extracted from the file, language
detected and appropriate stemmer hooked and the data is kept around
till lucene asks for it. The obvious approach is by extracting all the
data in advance, storing it in memory, deciding where to store
textcache, deciding the language and then comfortably feeding lucene
from the stored data. Thats not desired.

I hope you also see where the connection between language
determination and text-cache comes in. Go for them if you or anyone
wants to. Just let others know so there is no duplication in effort.

N. Lets not target a release and cram features in :) Instead if you
want to work on something, work on it. If it is done and release-ready
by 0.3, it will be included. Otherwise there is always another
release. There is little sense if including lots of half-complete,
pooly implemented features just to make the release notes look yummy
:-) Of course I am restating the obvious. (*)

- dBera

(*) When I sent out a to-come feature list in one of my earlier
emails, I was more stressing the fact that testing is becoming very
important and difficult with all these different features and less on
the fact that "Wow! Now we can do XXX too". Now I think I was misread.

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread Kevin Kubasik
Very cool, and good to hear. If Arun could share a patch for his
implementation, that would be awesome in terms of preventing wheel
reinvention ;) If Arun is unable, or doesn't have the time to look
into a hybrid solution, I wouldn't mind doing some investigative work,
 I think the biggest decision comes when its time to determine what
our cutoff is, (size wise). While there is a little extra complication
introduced by a hybrid system, I don't see it being a major  issue to
implement. My thought would just be to have a table in the
TextCache.db which denotes if a uri is stored in db or on disk. The
major concern is the cost of 2 sqlite queries per cache item.

Just my thoughts on the subject. DBera: are you saying that you want
to just work/look into the language stemming, or both the language
stemming and the text cache? Depending on what you want to work on, I
can help out with this, if its something we really want to see in
0.3.0. Lemme know.

Cheers,
Kevin Kubasik

On 10/2/07, Debajyoti Bera <[EMAIL PROTECTED]> wrote:
> > completely sure that such a loose typing system will greatly benefit
> > us when working with TEXT/STRING types, however, the gzipped blobs
> > might benefit from less disk usage thanks to being stored in a single
> > file, in addition, I know that incremental i/o is a possibility with
> > blobs in sqlite 3.4, which could potentially be utilized to optimize
> > work like this.
> >
> > Anyways, please send a patch to the list if thats not too much to ask,
> > or just give us an update as to how things are going.
>
> I and Arun had some discussion about this and we were trying to balance the
> performance and size issues. He already has the sqlite-idea implemented;
> however I would also like to see how a hybrid idea works i.e. store the huge
> number of extremely small files in sqlite and store the really large ones on
> the disk. Implementing this is tricky (*).
>
> - dBera
>
> (*) One of my recent efforts has been to add language detection support (based
> on a patch in bugzilla). This will enable us to use the right stemmers and
> analyzers depending on the language. The hard part is stealing some initial
> text for language detection and doing it in a transparent way. Incidentally,
> one implementation of the hybird approach mentioned above and the language
> detection crosses path. I am waiting for some free time to get going after
> them.
>
> --
> -
> Debajyoti Bera @ http://dtecht.blogspot.com
> beagle / KDE fan
> Mandriva / Inspiron-1100 user
>


-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-01 Thread Debajyoti Bera
> completely sure that such a loose typing system will greatly benefit
> us when working with TEXT/STRING types, however, the gzipped blobs
> might benefit from less disk usage thanks to being stored in a single
> file, in addition, I know that incremental i/o is a possibility with
> blobs in sqlite 3.4, which could potentially be utilized to optimize
> work like this.
>
> Anyways, please send a patch to the list if thats not too much to ask,
> or just give us an update as to how things are going.

I and Arun had some discussion about this and we were trying to balance the 
performance and size issues. He already has the sqlite-idea implemented; 
however I would also like to see how a hybrid idea works i.e. store the huge 
number of extremely small files in sqlite and store the really large ones on 
the disk. Implementing this is tricky (*).

- dBera

(*) One of my recent efforts has been to add language detection support (based 
on a patch in bugzilla). This will enable us to use the right stemmers and 
analyzers depending on the language. The hard part is stealing some initial 
text for language detection and doing it in a transparent way. Incidentally, 
one implementation of the hybird approach mentioned above and the language 
detection crosses path. I am waiting for some free time to get going after 
them.

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-01 Thread Kevin Kubasik
A quick followup, some reading here:

http://www.sqlite.org/datatype3.html

provides some insight into how exactly sqlite3 stores values, I'm not
completely sure that such a loose typing system will greatly benefit
us when working with TEXT/STRING types, however, the gzipped blobs
might benefit from less disk usage thanks to being stored in a single
file, in addition, I know that incremental i/o is a possibility with
blobs in sqlite 3.4, which could potentially be utilized to optimize
work like this.

Anyways, please send a patch to the list if thats not too much to ask,
or just give us an update as to how things are going.

Cheers,
Kevin Kubasik

On 10/1/07, Kevin Kubasik <[EMAIL PROTECTED]> wrote:
> On 8/19/07, Arun Raghavan <[EMAIL PROTECTED]> wrote:
> > Hello All,
> > This week I've been working on the new TextCache implementation that
> > I'd mentioned the last time (replacing the bunch of files with an
> > Sqlite db).
> >
> > Making an Sqlite db with just the uri and raw text caused an almost 3x
> > increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in
> > my test case). This despite the fact that the size of the raw text was
> > only 7.9 MB. I need to figure out why this happens. In the mean time,
> > I also implemented another version of this which stores (uri, gzipped
> > text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly,
> > this actually seems to work very well (the db for the test case
> > mentioned shrunk down to 2.6 MB, which is just a little more than the
> > actual size of the compressed data itself).
> My first impression on this is that Sqlite is probably building an
> index for the raw text data. where as the compressed data is simply
> treated as a binary 'glob'. I'm not 100% sure of the table definitions
> that your using, or exactly how much (in terms of Indexes) sqlite does
> automatically, but that seems like the most likely culprit. As we
> already have our own system for searching text ;) if you could find a
> way to force sqlite to not index the table's raw text column, you
> could probably get more sane numbers regarding the database size.
> However, its possible, its just how sqlite handles text content, and
> the gzipped text is the best way to go. The other thing to test is how
> this is handled in far larger situations. Is it possible that the
> first 1000 rows are very expensive, but when we scale to 5 rows,
> we see only a minute increase in size?
>
> >
> > Performance numbers on a search which returns 1205 results are below.
> > I basically ran the measurements twice -- once after flushing the
> > inode, dentry and page cache, and another time taking advantage of the
> > disk caches.
> >
> > Current TextCache:
> > no-disk-cache: ~1m
> > with-disk-cache: ~9s
> >
> > New TextCache (raw and gzipped versions had similar numbers):
> > no-disk-cache: ~42s
> > with-disk-cache: ~10s
> >
>
> Very cool/ interesting. One of the important cases to test here is
> multiple successive queries. Think like deskbar as a user types
> completion, how does such a system fair when it gets 15 or 20 queries
> back to back. Does the compression difference factor in then?
>
> > One very important factor remains to be seen -- memory usage. I am
> > working on figuring out what the impact of the new code on memory
> > usage is. Numbers should be available soon.
> >
> > On the Xesam front, I will be updating the code tomorrow,day-after to
> > reflect the latest changes to the spec.
>
> I know the Google SoC is over, and its completely ok if your too busy
> to complete these tests, but if would be awesome if you could provide
> a patch to the list so we can not only see exactly what you were
> doing, but so that someone else might finish up your work and/or get
> it merged in and ready for 0.3.0.
>
>
> > --
> > Arun Raghavan
>
>
> --
> Cheers,
> Kevin Kubasik
> http://kubasik.net/blog
>


-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-01 Thread Kevin Kubasik
On 8/19/07, Arun Raghavan <[EMAIL PROTECTED]> wrote:
> Hello All,
> This week I've been working on the new TextCache implementation that
> I'd mentioned the last time (replacing the bunch of files with an
> Sqlite db).
>
> Making an Sqlite db with just the uri and raw text caused an almost 3x
> increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in
> my test case). This despite the fact that the size of the raw text was
> only 7.9 MB. I need to figure out why this happens. In the mean time,
> I also implemented another version of this which stores (uri, gzipped
> text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly,
> this actually seems to work very well (the db for the test case
> mentioned shrunk down to 2.6 MB, which is just a little more than the
> actual size of the compressed data itself).
My first impression on this is that Sqlite is probably building an
index for the raw text data. where as the compressed data is simply
treated as a binary 'glob'. I'm not 100% sure of the table definitions
that your using, or exactly how much (in terms of Indexes) sqlite does
automatically, but that seems like the most likely culprit. As we
already have our own system for searching text ;) if you could find a
way to force sqlite to not index the table's raw text column, you
could probably get more sane numbers regarding the database size.
However, its possible, its just how sqlite handles text content, and
the gzipped text is the best way to go. The other thing to test is how
this is handled in far larger situations. Is it possible that the
first 1000 rows are very expensive, but when we scale to 5 rows,
we see only a minute increase in size?

>
> Performance numbers on a search which returns 1205 results are below.
> I basically ran the measurements twice -- once after flushing the
> inode, dentry and page cache, and another time taking advantage of the
> disk caches.
>
> Current TextCache:
> no-disk-cache: ~1m
> with-disk-cache: ~9s
>
> New TextCache (raw and gzipped versions had similar numbers):
> no-disk-cache: ~42s
> with-disk-cache: ~10s
>

Very cool/ interesting. One of the important cases to test here is
multiple successive queries. Think like deskbar as a user types
completion, how does such a system fair when it gets 15 or 20 queries
back to back. Does the compression difference factor in then?

> One very important factor remains to be seen -- memory usage. I am
> working on figuring out what the impact of the new code on memory
> usage is. Numbers should be available soon.
>
> On the Xesam front, I will be updating the code tomorrow,day-after to
> reflect the latest changes to the spec.

I know the Google SoC is over, and its completely ok if your too busy
to complete these tests, but if would be awesome if you could provide
a patch to the list so we can not only see exactly what you were
doing, but so that someone else might finish up your work and/or get
it merged in and ready for 0.3.0.


> --
> Arun Raghavan


-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-08-19 Thread Debajyoti Bera
> Making an Sqlite db with just the uri and raw text caused an almost 3x
> increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in
> my test case). This despite the fact that the size of the raw text was
> only 7.9 MB. I need to figure out why this happens. In the mean time,
> I also implemented another version of this which stores (uri, gzipped
> text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly,
> this actually seems to work very well (the db for the test case
> mentioned shrunk down to 2.6 MB, which is just a little more than the
> actual size of the compressed data itself).

> Current TextCache:
> no-disk-cache: ~1m
> with-disk-cache: ~9s
>
> New TextCache (raw and gzipped versions had similar numbers):
> no-disk-cache: ~42s
> with-disk-cache: ~10s

The numbers look pretty good. Size on disk is the main focus here. The disk 
cache will come into heavy play on a machine constantly serving queries. So 
even if that suffers a little bit (but only a little bit), I think its still 
ok if we gain in other places. The speedup with no-disk-cache is an added 
bonus.

Do the performance degrade when looking up small result sets ? In the current 
implementation, that will involve lesser disk seek whereas for the sqlite 
based approach, the I/O overhead will probably be similar.

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


GSoC Weekly Report

2007-08-19 Thread Arun Raghavan
Hello All,
This week I've been working on the new TextCache implementation that
I'd mentioned the last time (replacing the bunch of files with an
Sqlite db).

Making an Sqlite db with just the uri and raw text caused an almost 3x
increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in
my test case). This despite the fact that the size of the raw text was
only 7.9 MB. I need to figure out why this happens. In the mean time,
I also implemented another version of this which stores (uri, gzipped
text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly,
this actually seems to work very well (the db for the test case
mentioned shrunk down to 2.6 MB, which is just a little more than the
actual size of the compressed data itself).

Performance numbers on a search which returns 1205 results are below.
I basically ran the measurements twice -- once after flushing the
inode, dentry and page cache, and another time taking advantage of the
disk caches.

Current TextCache:
no-disk-cache: ~1m
with-disk-cache: ~9s

New TextCache (raw and gzipped versions had similar numbers):
no-disk-cache: ~42s
with-disk-cache: ~10s

One very important factor remains to be seen -- memory usage. I am
working on figuring out what the impact of the new code on memory
usage is. Numbers should be available soon.

On the Xesam front, I will be updating the code tomorrow,day-after to
reflect the latest changes to the spec.
-- 
Arun Raghavan
(http://nemesis.accosted.net)
v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056
e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


GSoC Weekly Report

2007-08-13 Thread Arun Raghavan
Hello All,
Sorry about the super-late weekly report ... it's been a crazy and
hectic weekend.

On the Xesam front, once the proposals up that are for review are
finalized I can go about implementing them.

As I'd mentioned last time, I was looking at improving the disk usage
of the TextCache. Currently the TextCache maintains an Sqlite DB with
the uri of the file and a pointer to the gzipped text from the file.
I've implemented an alternative TextCache which stores the uri and the
text itself. I was hoping to have numbers from a comparison of the
two, but unfortunately wasn't able to complete this. I have to write a
small tool to migrate the old TextCache to the new one so they're both
the same, and then try doing a large number of fetches to compare
performance.

That's all for now, folks.
-- 
Arun Raghavan
(http://nemesis.accosted.net)
v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056
e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


GSoc weekly report (Browser Extension Rewrite)

2007-08-12 Thread Tao Fei
Hi,
It's time for weekly report again.
Last week. for firefox extension. I
* improve firefox's bookmark index
* modified the preference dialog ( did a little simplify)
* find and fixed a few other bugs  ( redirect problem , no-html file problem)
And the firefox extension is almost finished .
for epiphany extension,
I basically work on config-file parse / generate
and the index-link  feature.
What to do next:
for firefox extension: code clean-up (remove debug information, check
the spelling and words. ) , more test and document .
for epiphany extension:
add i18n support (using gettext)

I think I can finish them in next week .
Thanks.
-- 
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc weekly report (Browser Extension Rewrite)

2007-08-08 Thread Debajyoti Bera
Tao,
  I was testing the extension, when I noticed this (browser.dump enabled):

[beagle] [beaglPref.get beagle.bookmark.active] [Exception... "Component 
returned failure code: 0x8000 (NS_ERROR_UNEXPECTED) 
[nsIPrefBranch.getBoolPref]"  nsresult: "0x8000 (NS_ERROR_UNEXPECTED)"  
location: "JS frame :: chrome://newbeagle/content/utils.js :: anonymous :: 
line 53"  data: no]

This was getting thrown on the terminal multiple times. Not quite sure what 
was triggering this.

(I didnt set any option explicitly in the preferences)

Also, in beagleoverlay.js:writeMetadata, uniddexed should be unindexed (typo).
Could you store the URLs as text and not keyword - people should be able to 
query part of the url too :) ?

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc weekly report (Browser Extension Rewrite)

2007-08-07 Thread Tao Fei
2007/8/7, Joe Shaw <[EMAIL PROTECTED]>:
> Hi,
>
> On 8/6/07, Tao Fei <[EMAIL PROTECTED]> wrote:
> > 2007/8/7, Joe Shaw <[EMAIL PROTECTED]>:
> > > I've been playing around with the new extension, and I'm seeing a
> > > little inconsistent behavior with it.  I wonder if it's related to me
> > > having the old Beagle extension installed as well (although I disabled
> > > that one).
> >
> > Yes. That's the problem.
> > I use the same preference name "beagle.enabled" with the old extension.
> > Fixed now. Using "beagle.autoindex.active" now, the tooltip is also updated.
> > "beagle.enabled" is wrong word, as it doesn't affect on-demand index.
>
> Cool, I'll give it a test later today.
>
> We should keep in mind a migration path for the old extension.
> Ideally the new one will just be a drop-in replacement, and if we
> could migrate the basic settings (ie, enabled/disabled and a
> whitelist/blacklist) that would be ideal.
Oh. You can import the preference from old extension. ( just open the
preference window, and you can see the button).
May be I should  imported them silently  when the new extension is installed.
>We'll may also want to use
> the same UUID so that upgrades are done cleanly, if there's no method
> for obsoleting other extensions.
The same UUID ? I guess we only need to modified the install.rdf to
change the UUID.

-- 
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc weekly report (Browser Extension Rewrite)

2007-08-07 Thread Joe Shaw
Hi,

On 8/6/07, Tao Fei <[EMAIL PROTECTED]> wrote:
> 2007/8/7, Joe Shaw <[EMAIL PROTECTED]>:
> > I've been playing around with the new extension, and I'm seeing a
> > little inconsistent behavior with it.  I wonder if it's related to me
> > having the old Beagle extension installed as well (although I disabled
> > that one).
>
> Yes. That's the problem.
> I use the same preference name "beagle.enabled" with the old extension.
> Fixed now. Using "beagle.autoindex.active" now, the tooltip is also updated.
> "beagle.enabled" is wrong word, as it doesn't affect on-demand index.

Cool, I'll give it a test later today.

We should keep in mind a migration path for the old extension.
Ideally the new one will just be a drop-in replacement, and if we
could migrate the basic settings (ie, enabled/disabled and a
whitelist/blacklist) that would be ideal.  We'll may also want to use
the same UUID so that upgrades are done cleanly, if there's no method
for obsoleting other extensions.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc weekly report (Browser Extension Rewrite)

2007-08-06 Thread Tao Fei
2007/8/7, Joe Shaw <[EMAIL PROTECTED]>:
> Hey,
>
> I've been playing around with the new extension, and I'm seeing a
> little inconsistent behavior with it.  I wonder if it's related to me
> having the old Beagle extension installed as well (although I disabled
> that one).
Yes. That's the problem.
I use the same preference name "beagle.enabled" with the old extension.
Fixed now. Using "beagle.autoindex.active" now, the tooltip is also updated.
"beagle.enabled" is wrong word, as it doesn't affect on-demand index.

> Whenever I open a site, I get the little dog icon with an X over it,
> indicating that it's not indexing that page.  The page is not from
> HTTPS, and when I open the preferences dialog, the "Default Action"
> has "Index" selected.  If I click on the icon to toggle it, it gets
> indexed fine.  But I'm not sure why it's not by default.  After I
> toggle the icon, any subsequent page opens are indexed.
>
> Joe
>


-- 
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc weekly report (Browser Extension Rewrite)

2007-08-06 Thread Joe Shaw
Hey,

I've been playing around with the new extension, and I'm seeing a
little inconsistent behavior with it.  I wonder if it's related to me
having the old Beagle extension installed as well (although I disabled
that one).

Whenever I open a site, I get the little dog icon with an X over it,
indicating that it's not indexing that page.  The page is not from
HTTPS, and when I open the preferences dialog, the "Default Action"
has "Index" selected.  If I click on the icon to toggle it, it gets
indexed fine.  But I'm not sure why it's not by default.  After I
toggle the icon, any subsequent page opens are indexed.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


GSoc weekly report (Browser Extension Rewrite)

2007-08-04 Thread Tao Fei
Hi,
It's time for weekly report again.
Last week, I work on both firefox and epiphany extension.
For firefox: (Thanks for Debajyoti's suggestion)
* add referrer url  to meta file
* index bookmarks (on-demand index or index when close)
For Epiphany (now in python):
* config file support
* basic menu items (index this page , toggle auto-index)
* status-bar label
And some document.
What to do next :
* improve firefox's bookmarks index
* epiphany extension improvement ( any suggestion?)
* solve some usability problem
* document
You can follow http://beagle-project.org/Browser_Extension 's instructions
to install and test the extension, and give me some feedback.

Thanks.
-- 
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


GSoc weekly report (Browser Extension Rewrite)

2007-07-29 Thread Tao Fei
Hi, All
This has been a relatively slow week. I've basically been looking at
the epiphany extension
* read more document , read the epiphany source code and some example
extension code
* write a small script in  python ( I take it as an "experiment" )
http://browser-extension-for-beagle.googlecode.com/svn/trunk/py-epiphany-extension/

The main target of next week, rewirte the epiphany extension ( in C )
* save file to ~/.beagle/ToIndex  instead of calling beagle-index-url
* add basic config  ( index HTTPS or not , exclude /include rule)
* UI  (It may have some problems  as I am not familiar with GTK)

Thanks.

-- 
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


GSoc Weekly Report (Browser Extension Rewrite)

2007-07-22 Thread Tao Fei
Hi, all.
   Sorry for  a little delay. In last week, I kept  on improving the
firefox extension.
   * port the existed code for search (this page, link, selected text)
to the new extension
   * fix  some bug about index-link ( now can handle the non-html file
in proper way)
   * modify the preferences related codes (give a proper interface)
   * add a function to import the old extension's preferences (the
security filters)
   And finally I manage to pack it up, and do more test.
   You may to try it out . Just download it from
http://browser-extension-for-beagle.googlecode.com/files/[EMAIL PROTECTED]
   It may still have some problems, so if you find any bug please let me know.

   For epiphany extension , I read some documents and the existed code
in last week. I will begin coding on it this week.

   what to do next week
   * more test/document for firefox extension
   * epiphany extension develop. (just a list , I've no idea about how
many of them can be done in the next week)
   save file to .beagle/ToIndex/  instead of call beagle-index-url
   add config related code
   GUI ?


Thanks.

-- 
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


RE: GSoc Weekly Report (Browser Extension Rewrite)

2007-07-20 Thread Kevin Kubasik
Hey, a quick note on the subject, I made a haphazard attempt at this rewrite 
some time ago, and faced the same issue you have now. I think the deciding 
factor would be your personal experience with the languages. If you have never 
really worked with C, but have used python, I would think that a well designed 
and well written python plugin is much better than a haphazard 'My First C' 
program. The second concern/thought is that a lot of users will leave their 
browsers open for hours (if not days) at a time, I'm not 100% sure if this 
applies in the plugin context, but a GC system probably offers some safety net 
for memory use.

Just a quick $0.02,
Kevin Kubasik


-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Joe Shaw
Sent: Friday, July 20, 2007 10:49 AM
To: Tao Fei
Cc: dashboard-hackers@gnome.org
Subject: Re: GSoc Weekly Report (Browser Extension Rewrite)

Hi,

On 7/14/07, Tao Fei <[EMAIL PROTECTED]> wrote:
> I've noticed that Epiphany can be written in C or in Python. The old
> extension is written in C. I'm wondering whether it is acceptable if I
> write the extension in python ?

It's a possibility, although I'm not crazy about adding a Python
dependency to Beagle (not libbeagle, which already has an optional
Python dep for the bindings).  It's probably not unreasonable to
assume that anyone with Epiphany installed will also have Python,
however.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


smime.p7s
Description: S/MIME cryptographic signature
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc Weekly Report (Browser Extension Rewrite)

2007-07-20 Thread Joe Shaw
Hi,

On 7/14/07, Tao Fei <[EMAIL PROTECTED]> wrote:
> I've noticed that Epiphany can be written in C or in Python. The old
> extension is written in C. I'm wondering whether it is acceptable if I
> write the extension in python ?

It's a possibility, although I'm not crazy about adding a Python
dependency to Beagle (not libbeagle, which already has an optional
Python dep for the bindings).  It's probably not unreasonable to
assume that anyone with Epiphany installed will also have Python,
however.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


GSoc Weekly Report (Browser Extension Rewrite)

2007-07-14 Thread Tao Fei

Hi, all.
   What I have done in last week ,
*UI improvement content menu (for status icon and content Area ) / toolbar icon
*quick add exclude / include rule
*index link  (the target can be non-html file)
*read some document about Epiphany extension develop


What to do next:
* more UI improvement for firefox extension
* code cleanup , documenting
* set up the epiphany extension develop environment and try something

I've noticed that Epiphany can be written in C or in Python. The old
extension is written in C. I'm wondering whether it is acceptable if I
write the extension in python ?

Thanks.

--
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc Weekly Report (Browser Extension Rewrite)

2007-07-11 Thread Tao Fei

Sorry for late.
I have just got back to home.
I got some network problem, and  failed to  get access to the network
until today,


There was a recent one opened against the old extension about
internationalization.  I think that's a pretty important task that
this one should address.  There is even a patch attached to that bug,
although I haven't looked at it closely.

Yes,I have notice that. (Debajyoti have cc-ed  this bug to me)
I'd like to say that  the new extension will be  "translatable". I
have put all the UI string in .dtd file and all the javascript string
in .properties  file (expect some debug information).
And I will keep doing that.


Thanks.
--
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc Weekly Report (Browser Extension Rewrite)

2007-07-10 Thread Joe Shaw
Hi,

On 7/6/07, Tao Fei <[EMAIL PROTECTED]> wrote:
> I did a little search in http://bugzilla.gnome.org/ , there are some
> bug reports for the extension.
> eg, Bug 317605:  http://bugzilla.gnome.org/show_bug.cgi?id=317605
> In fact , I use "status bar label " to indicate  whether the page is
> indexed. And use the beagle icon to indicate wheather the beagle is
> enabled or disabled or in a error state.
> The icon is "global". I think it partly fix the bug .

Yeah, I think this is a good idea.  I didn't like before that the icon
was overloaded for two questions: is this page indexed? and is the
extension enabled for this page?  Separating those concepts is a good
idea.

> What to do next:
> * to fix the bugs  in  bugzilla (or avoid producing them )

There was a recent one opened against the old extension about
internationalization.  I think that's a pretty important task that
this one should address.  There is even a patch attached to that bug,
although I haven't looked at it closely.

Thanks,
Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


GSoc Weekly Report (Browser Extension Rewrite)

2007-07-06 Thread Tao Fei

Hi, all.
In the last week (or last two weeks) , I have improve the
extension a little.
* popup menu
* "index this page "  index current page , ignore the exclude /include rules.
*  Change the status bar label to "beagle is indexing [URL]" when a
page is indexed.

I did a little search in http://bugzilla.gnome.org/ , there are some
bug reports for the extension.
eg, Bug 317605:  http://bugzilla.gnome.org/show_bug.cgi?id=317605
In fact , I use "status bar label " to indicate  whether the page is
indexed. And use the beagle icon to indicate wheather the beagle is
enabled or disabled or in a error state.
The icon is "global". I think it partly fix the bug .

What to do next:
* to fix the bugs  in  bugzilla (or avoid producing them )
* make it more convient to  add filters  ( something like add "alway
index this domain/page"  "never index this domain/page "  menu )

Thanks.
--
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


GSoc Weekly Report (Browser Extension Rewrite)

2007-06-22 Thread Tao Fei

Hi,

Last week, I
* bug fix
* test for index (exclude / include filter , status switch ..)
* finish the first run-able version .


Goals for next week:
* remove the jslib related code  ( The lib  is too big)
* add context menu
* (if have time ) add "index it now " button , add "filter quick add " button.

Thanks

---
TaoFei (陶飞)
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


GSoc weekly report (Browser Extension Rewrite)

2007-06-16 Thread Tao Fei

Hi all.
Last week, I
* move all the literal string in code to beagle.properties , and some
help function for i18n
* index html page (not enough test)

I will have some examination in the next two weeks, so
Goals for next week:
* more test
* remove the jslib realted code  ( The lib  is too big)
* add context menu


Thanks.

--
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


GSoc weekly report (Browser Extension Rewrite)

2007-06-08 Thread Tao Fei

Hi all.
Last week, I
* modified the preference code ( remove global vars , documents comments)
* some utils function   (string operation , wildcard to RE,etc)
* filter ( check weather a url should be indexed based on the
exclude/include rule)
* upload the code to code.google.com
 (http://code.google.com/p/browser-extension-for-beagle/source)


Next week:
* do index
* some test / log / debug code
* i18n (move all the literal string in code to beagle.properties.

Thanks.
--
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


GSoC Weekly Report (Thundebird backend rewrite)

2007-04-27 Thread Pierre Östlund

Hey!

Time for my first weekly summer of code report. Here's a summary of what
I've been working on so far:
* Reviewed my old Thunderbird code
* Cleared some question marks by studying some of the Thunderbird source
code
* Begun the work on the core framework that I will base the backend upon
(based on the review of my old code and Thunderbird source code study)
* Made some regression tests
* Written documentation

My ambitions for next week:
* Work on the framework and continue documenting it
* Write more regression tests

By "core framework" above, I mean a draft of all classes that I will need to
implement the backend, the entire structure of these (methods, properties
etc.) and the behavior of the each class. The core part (which will later
live in the utility part of beagle) is almost complete, I'm currently
writing documentation for it. I have found the documentation part extremely
useful, it lets me fix all small design flaws that I originally did not see.
It will (hopefully) also make it easier for you guys to see and understand
what I'm working on.

Other things that might be interesting to know is that I've decided to use
NUnit as my framework for writing regression tests. It contains what i need,
is simple to use and integrates well with monodevelop, which is my primary
IDE. I'm using monodocer to generate XML-documentation and monodoc to edit
it, for the ones interested (I'm _not_ using inline XML, as a lot of people
does not like this), and I will link to an HTML version of the documentation
on my blog once I'm done.

I hope to finish the core framework by next week so that I can publish it
and hopefully get some input on it before I go on and write any code. More
on this in next weeks report and on my blog.

Thanks!

Pierre Östlund
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers