Re: GSoC Weekly Report

2007-10-18 Thread Joe Shaw
Hi,

On 10/16/07, D Bera [EMAIL PROTECTED] wrote:
 A followup question, I didnot find any API documentation of
 Mono.Data.Sqlite :( #mono was also sleeping when I asked the question
 there.

My understanding is that both M.D.SqliteClient and M.D.Sqlite follow
the general ADO.Net API patterns and that the latter is more or less a
drop-in replacement for the former.  A few things may need to be
tweaked, but in general just changing the using statements at the
top of each source file should be all that's needed.

 If M.D.Sqlite does not have a way to return rows on demand, I
 am against the migration. In the worst case, we can ship with a
 modified copy of M.D.Sqlite but I am not sure what will that buy us.

You've always been able to get rows on demand via ADO.Net, it's just a
matter of the implementation underneath.  The old one (not modified by
us) would load all of them into memory.  I'm not sure how the new one
performs memory-wise.  If the Mono guys don't have any idea, the right
thing to do here would be to create a large test database (or use an
existing TextCache or FAStore db) and do a SELECT * using the 3
implementations and walk the results, using heap-buddy and/or
heap-shot to analyze their memory usage.

 In the same breath, what is the benefit of M.D.Sqlite over
 M.D.SqliteClient for beagle ? I figured out there are some ADO.Net
 advantages but other than that ... ?

It's maintained for one, which our modified one essentially isn't.  It
has the backing of the Mono team.  The code is much cleaner and easier
to understand, largely because it doesn't have two separate codepaths
(one for v2 and one for v3).  I am sure the Mono guys have other good
reasons too. :)

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-18 Thread Debajyoti Bera
  A followup question, I didnot find any API documentation of
  Mono.Data.Sqlite :( #mono was also sleeping when I asked the question
  there.

 My understanding is that both M.D.SqliteClient and M.D.Sqlite follow
 the general ADO.Net API patterns and that the latter is more or less a
 drop-in replacement for the former.  A few things may need to be
 tweaked, but in general just changing the using statements at the
 top of each source file should be all that's needed.

I was more looking for some method for row-by-row retrieval, on demand. Real 
on-demand, where the implementation does not retrieve all the rows at once 
but returns one by one.

 You've always been able to get rows on demand via ADO.Net, it's just a
 matter of the implementation underneath.  The old one (not modified by
 us) would load all of them into memory.  I'm not sure how the new one
 performs memory-wise.  If the Mono guys don't have any idea, the right

I checked the source out of curiousity
http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/Mono.Data.Sqlite/Mono.Data.Sqlite/
And the code for DataReader looks exactly the same (didnt do a diff, just 
visually) as the one in Mono.Data.SqliteClient. So even if we migrate (the 
migration would be easy), we still have to ship with a modified inhouse 
M.D.Sqlite and keep syncing in with upstream. *sigh*

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-18 Thread Debajyoti Bera
Ignore my previous email ... I was looking at the wrong place :(
This is the right place for the new M.D.Sqlite
http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/Mono.Data.Sqlite/Mono.Data.Sqlite_2.0/SQLiteDataReader.cs

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Migrate to Mono.Data.Sqlite (Was: Re: GSoC Weekly Report)

2007-10-18 Thread Debajyoti Bera
 Ignore my previous email ... I was looking at the wrong place :(
 This is the right place for the new M.D.Sqlite
 http://anonsvn.mono-project.com/viewcvs/trunk/mcs/class/Mono.Data.Sqlite/Mo
no.Data.Sqlite_2.0/SQLiteDataReader.cs

Migration from Mono.Data.SqliteClient to Mono.Data.Sqlite completed (rev 
4061).

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-16 Thread Debajyoti Bera
  What to do with our local changes to Mono.Data.SqliteClient ? I always
  get confused with them. Dont even know what are those changes and why are
  they there :-/ (it has something to with threading and locking) ?

 The work done locally was mainly for memory usage reasons.  IIRC, the
 upstream bindings pull all of the results into memory at once, whereas
 our locally modified ones do so only as needed.  I don't think
 threading/locking was ever an issue -- you might be confusing it with
 the fact that we couldn't use early sqlite 3.x versions because of
 broken policy in the library to that effect.

Probably you are right. I still had to verify ...
beagle:/source=mind?query=sqlite+beagle+lock
returned nothing :-D
but google returned,
http://lists.ximian.com/pipermail/mono-devel-list/2005-November/015977.html 
which mentions Lock ... yay! My faith in my memory is restored ;-)

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-16 Thread Joe Shaw
Hi,

On 10/16/07, Debajyoti Bera [EMAIL PROTECTED] wrote:
   What to do with our local changes to Mono.Data.SqliteClient ? I always
   get confused with them. Dont even know what are those changes and why are
   they there :-/ (it has something to with threading and locking) ?
 
  The work done locally was mainly for memory usage reasons.  IIRC, the
  upstream bindings pull all of the results into memory at once, whereas
  our locally modified ones do so only as needed.  I don't think
  threading/locking was ever an issue -- you might be confusing it with
  the fact that we couldn't use early sqlite 3.x versions because of
  broken policy in the library to that effect.

 Probably you are right. I still had to verify ...
 beagle:/source=mind?query=sqlite+beagle+lock
 returned nothing :-D
 but google returned,
 http://lists.ximian.com/pipermail/mono-devel-list/2005-November/015977.html
 which mentions Lock ... yay! My faith in my memory is restored ;-)

Indeed you're right, but those changes did get merged upstream.  So
the memory usage I believe is the only outstanding reason.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-16 Thread D Bera
 Indeed you're right, but those changes did get merged upstream.  So
 the memory usage I believe is the only outstanding reason.

Sweet.
A followup question, I didnot find any API documentation of
Mono.Data.Sqlite :( #mono was also sleeping when I asked the question
there. If M.D.Sqlite does not have a way to return rows on demand, I
am against the migration. In the worst case, we can ship with a
modified copy of M.D.Sqlite but I am not sure what will that buy us.
In the same breath, what is the benefit of M.D.Sqlite over
M.D.SqliteClient for beagle ? I figured out there are some ADO.Net
advantages but other than that ... ?

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-15 Thread Joe Shaw
Hi,

On 10/13/07, Debajyoti Bera [EMAIL PROTECTED] wrote:
 What to do with our local changes to Mono.Data.SqliteClient ? I always get
 confused with them. Dont even know what are those changes and why are they
 there :-/ (it has something to with threading and locking) ?

The work done locally was mainly for memory usage reasons.  IIRC, the
upstream bindings pull all of the results into memory at once, whereas
our locally modified ones do so only as needed.  I don't think
threading/locking was ever an issue -- you might be confusing it with
the fact that we couldn't use early sqlite 3.x versions because of
broken policy in the library to that effect.

I'm not sure what the memory side effects of the newer upstream bindings are.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-13 Thread Debajyoti Bera
 Sorry, I was unclear.  By removing sqlite2 I meant simply removing
 it as an option from configure.in and requiring only sqlite3, not
 removing the codepaths from the cut-and-pasted code.  Then, at some
 point in the future, porting over to Mono's own Mono.Data.Sqlite.

What to do with our local changes to Mono.Data.SqliteClient ? I always get 
confused with them. Dont even know what are those changes and why are they 
there :-/ (it has something to with threading and locking) ?

- dBera

PS: Mannn... I love these Liberation fonts... can't stop reading the same mail 
ten times :P

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-10 Thread Joe Shaw
Hi,

On 10/9/07, D Bera [EMAIL PROTECTED] wrote:
  At this point, I'm in favor of dropping support for sqlite2 entirely
  anyway.  That will make a migration to the new Mono sqlite bindings
  smoother, and drop a nasty chunk of cut-and-paste-and-patch code in
  the tree.

 Me too, me too ...
 But I see no point in the double effort in once removing sqlite-2
 support and then changing the code to use mono.data.sqlite. Any
 volunteers for the cleanup ?

Sorry, I was unclear.  By removing sqlite2 I meant simply removing
it as an option from configure.in and requiring only sqlite3, not
removing the codepaths from the cut-and-pasted code.  Then, at some
point in the future, porting over to Mono's own Mono.Data.Sqlite.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-09 Thread Joe Shaw
Hi,

On 10/8/07, Debajyoti Bera [EMAIL PROTECTED] wrote:
 One thing I forgot to test was support for sqlite-2. Could anyone with
 sqlite-2 sync svn trunk and see if things work as expected ? .beagle/ might
 need to be deleted and files/emails re-indexed.

At this point, I'm in favor of dropping support for sqlite2 entirely
anyway.  That will make a migration to the new Mono sqlite bindings
smoother, and drop a nasty chunk of cut-and-paste-and-patch code in
the tree.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-08 Thread Debajyoti Bera
Hi,

First the context of this discussion: better storing of cached data (aka 
textcache).

 Very cool, and good to hear. If Arun could share a patch for his
 implementation, that would be awesome in terms of preventing wheel
 reinvention ;) If Arun is unable, or doesn't have the time to look
 into a hybrid solution, I wouldn't mind doing some investigative work,
  I think the biggest decision comes when its time to determine what
 our cutoff is, (size wise). While there is a little extra complication
 introduced by a hybrid system, I don't see it being a major  issue to
 implement. My thought would just be to have a table in the
 TextCache.db which denotes if a uri is stored in db or on disk. The
 major concern is the cost of 2 sqlite queries per cache item.

 Just my thoughts on the subject. DBera: are you saying that you want
 to just work/look into the language stemming, or both the language
 stemming and the text cache? Depending on what you want to work on, I
 can help out with this, if its something we really want to see in
 0.3.0. Lemme know.
   completely sure that such a loose typing system will greatly benefit
   us when working with TEXT/STRING types, however, the gzipped blobs
   might benefit from less disk usage thanks to being stored in a single
   file, in addition, I know that incremental i/o is a possibility with
   blobs in sqlite 3.4, which could potentially be utilized to optimize
   work like this.
  
   Anyways, please send a patch to the list if thats not too much to ask,
   or just give us an update as to how things are going.
 
  I and Arun had some discussion about this and we were trying to balance
  the performance and size issues. He already has the sqlite-idea
  implemented; however I would also like to see how a hybrid idea works
  i.e. store the huge number of extremely small files in sqlite and store
  the really large ones on the disk. Implementing this is tricky.

I just checked in some changes implementing the above hybrid idea. Currently, 
any file less than 4K gzipped is an extremely small file (stored in db) and 
anything more is a really large one (stored on disk). The cutoff is 
hardcoded in TextCache.cs/BLOB_SIZE_LIMIT The number of files and the disk 
size of .beagle/TextCache reduces significantly. Performance and memory 
should not suffer noticably unless I did something stupid.

One thing I forgot to test was support for sqlite-2. Could anyone with 
sqlite-2 sync svn trunk and see if things work as expected ? .beagle/ might 
need to be deleted and files/emails re-indexed.

In the past, I emailed how this feature relates to language determination. It 
still does but that would require some more work (hint: somehow merge 
TextCacheWriteStream and PullingReader) and a significant bit of testing. I 
have no plans on working on it now.

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread Kevin Kubasik
Very cool, and good to hear. If Arun could share a patch for his
implementation, that would be awesome in terms of preventing wheel
reinvention ;) If Arun is unable, or doesn't have the time to look
into a hybrid solution, I wouldn't mind doing some investigative work,
 I think the biggest decision comes when its time to determine what
our cutoff is, (size wise). While there is a little extra complication
introduced by a hybrid system, I don't see it being a major  issue to
implement. My thought would just be to have a table in the
TextCache.db which denotes if a uri is stored in db or on disk. The
major concern is the cost of 2 sqlite queries per cache item.

Just my thoughts on the subject. DBera: are you saying that you want
to just work/look into the language stemming, or both the language
stemming and the text cache? Depending on what you want to work on, I
can help out with this, if its something we really want to see in
0.3.0. Lemme know.

Cheers,
Kevin Kubasik

On 10/2/07, Debajyoti Bera [EMAIL PROTECTED] wrote:
  completely sure that such a loose typing system will greatly benefit
  us when working with TEXT/STRING types, however, the gzipped blobs
  might benefit from less disk usage thanks to being stored in a single
  file, in addition, I know that incremental i/o is a possibility with
  blobs in sqlite 3.4, which could potentially be utilized to optimize
  work like this.
 
  Anyways, please send a patch to the list if thats not too much to ask,
  or just give us an update as to how things are going.

 I and Arun had some discussion about this and we were trying to balance the
 performance and size issues. He already has the sqlite-idea implemented;
 however I would also like to see how a hybrid idea works i.e. store the huge
 number of extremely small files in sqlite and store the really large ones on
 the disk. Implementing this is tricky (*).

 - dBera

 (*) One of my recent efforts has been to add language detection support (based
 on a patch in bugzilla). This will enable us to use the right stemmers and
 analyzers depending on the language. The hard part is stealing some initial
 text for language detection and doing it in a transparent way. Incidentally,
 one implementation of the hybird approach mentioned above and the language
 detection crosses path. I am waiting for some free time to get going after
 them.

 --
 -
 Debajyoti Bera @ http://dtecht.blogspot.com
 beagle / KDE fan
 Mandriva / Inspiron-1100 user



-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread D Bera
 Just my thoughts on the subject. DBera: are you saying that you want
 to just work/look into the language stemming, or both the language
 stemming and the text cache? Depending on what you want to work on, I
 can help out with this, if its something we really want to see in
 0.3.0. Lemme know.

1. I definitely don't have the time, lest it would have been done by now :)
2. I will locate Arun's patch and send it out; its a good
implementation and can acts a reference.
3. The problem is less on the number of queries. It is more about
sending the data to textcache (which can either store it gzipped in
sqlite or gzipped on disk), and to the language determination class
and to lucene without (repeat:without) storing all the data in a huge
store/string in memory. I thought a cutoff size of disk_block_size
would be a good starting point, it will reduce external fragmentation
to a good degree since most textcache files are less than 1 block. So
the decision to store on disk or in sqlite can only come after we have
read, say 4KB of data. The language determination, I think, requires
1K of text. In our filter/lucene interface, lucene asks for data and
then filters go and extract little more data from the file and send it
back: this goes in loop till there is no more data to extract. There
is no storing of data in the memory! So to do the whole thing
correctly, as lucene asks for more data the filters return the data
and transparently someone in the middle decides whether to store the
data in sqlite or disk (and does so); furthermore, even before lucene
asks for data, about 1K of data is extracted from the file, language
detected and appropriate stemmer hooked and the data is kept around
till lucene asks for it. The obvious approach is by extracting all the
data in advance, storing it in memory, deciding where to store
textcache, deciding the language and then comfortably feeding lucene
from the stored data. Thats not desired.

I hope you also see where the connection between language
determination and text-cache comes in. Go for them if you or anyone
wants to. Just let others know so there is no duplication in effort.

N. Lets not target a release and cram features in :) Instead if you
want to work on something, work on it. If it is done and release-ready
by 0.3, it will be included. Otherwise there is always another
release. There is little sense if including lots of half-complete,
pooly implemented features just to make the release notes look yummy
:-) Of course I am restating the obvious. (*)

- dBera

(*) When I sent out a to-come feature list in one of my earlier
emails, I was more stressing the fact that testing is becoming very
important and difficult with all these different features and less on
the fact that Wow! Now we can do XXX too. Now I think I was misread.

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread Arun Raghavan
On 02/10/2007, Kevin Kubasik [EMAIL PROTECTED] wrote:
 Very cool, and good to hear. If Arun could share a patch for his
 implementation, that would be awesome in terms of preventing wheel
 reinvention ;) If Arun is unable, or doesn't have the time to look
 into a hybrid solution, I wouldn't mind doing some investigative work,

I've been completely swamped with work here in the first half of the
semester, and I spent a little time getting the xesam-adaptor updated
to the latest spec. Do let me know if you're taking this up, so
there's no duplication of effort. The patch against r4013 is attached.

  I think the biggest decision comes when its time to determine what
 our cutoff is, (size wise). While there is a little extra complication
 introduced by a hybrid system, I don't see it being a major  issue to
 implement. My thought would just be to have a table in the
 TextCache.db which denotes if a uri is stored in db or on disk. The
 major concern is the cost of 2 sqlite queries per cache item.

Might it not be easier to have a boolean field denoting whether the
field is an on-disk URI or the blob itself? Or better, if this is
possible, to just examine the first few bytes to see if they are some
ASCII text (or !(the Zip magic bytes))

Best,
-- 
Arun Raghavan
(http://nemesis.accosted.net)
v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056
e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com
Index: beagled/FileSystemQueryable/FileSystemQueryable.cs
===
--- beagled/FileSystemQueryable/FileSystemQueryable.cs  (revision 4013)
+++ beagled/FileSystemQueryable/FileSystemQueryable.cs  (working copy)
@@ -1810,17 +1810,12 @@
// is stored in a property.
Uri uri = UriFu.EscapedStringToUri (hit 
[beagle:InternalUri]);
 
-   string path = TextCache.UserCache.LookupPathRaw (uri);
+   Stream text = TextCache.UserCache.LookupText(uri, 
hit.Uri.LocalPath);
 
-   if (path == null)
+   if (text == null)
return null;
 
-   // If this is self-cached, use the remapped Uri
-   if (path == TextCache.SELF_CACHE_TAG)
-   return SnippetFu.GetSnippetFromFile 
(query_terms, hit.Uri.LocalPath, full_text);
-
-   path = Path.Combine (TextCache.UserCache.TextCacheDir, 
path);
-   return SnippetFu.GetSnippetFromTextCache (query_terms, 
path, full_text);
+   return SnippetFu.GetSnippet(query_terms, new 
StreamReader(text), full_text);
}
 
override public void Start ()
Index: beagled/TextCache.cs
===
--- beagled/TextCache.cs(revision 4013)
+++ beagled/TextCache.cs(working copy)
@@ -37,6 +37,53 @@
 
 namespace Beagle.Daemon {
 
+   // We only have this class because GZipOutputStream doesn't let us
+   // retrieve the baseStream
+   public class TextCacheStream : GZipOutputStream {
+   private Stream stream;
+
+   public Stream BaseStream {
+   get { return stream; }
+   }
+
+   public TextCacheStream() : this(new MemoryStream())
+   {
+   }
+
+   public TextCacheStream(Stream stream) : base(stream)
+   {
+   this.stream = stream;
+   this.IsStreamOwner = false;
+   }
+   }
+
+   public class TextCacheWriter : StreamWriter {
+   private Uri uri;
+   private TextCache parent_cache;
+   private TextCacheStream tcStream;
+
+   public TextCacheWriter(TextCache cache, Uri uri, 
TextCacheStream tcStream) : base(tcStream)
+   {
+   parent_cache = cache;
+   this.uri = uri;
+   this.tcStream = tcStream;
+   }
+
+   override public void Close()
+   {
+   base.Close();
+
+   Stream stream = tcStream.BaseStream;
+
+   byte[] text = new byte[stream.Length];
+   stream.Seek(0, SeekOrigin.Begin);
+   stream.Read(text, 0, (int)stream.Length);
+
+   parent_cache.Insert(uri, text);
+   tcStream.BaseStream.Close();
+   }
+   }
+
// FIXME: This class isn't multithread safe!  This class does not
// ensure that different threads don't utilize a transaction started
// in a certain thread at the same time.  However, since all the
@@ -50,7 +97,7 @@
 
static public bool Debug = false;
 
-   public const string SELF_CACHE_TAG = *self*;
+   private const string 

Re: GSoC Weekly Report

2007-10-02 Thread Arun Raghavan
On 02/10/2007, Kevin Kubasik [EMAIL PROTECTED] wrote:
 A quick followup, some reading here:

 http://www.sqlite.org/datatype3.html

 provides some insight into how exactly sqlite3 stores values, I'm not
 completely sure that such a loose typing system will greatly benefit
 us when working with TEXT/STRING types, however, the gzipped blobs
 might benefit from less disk usage thanks to being stored in a single
 file, in addition, I know that incremental i/o is a possibility with
 blobs in sqlite 3.4, which could potentially be utilized to optimize
 work like this.

If the bindings wrap a Stream around this, this would be ideal. There
doesn't seem to be much documentation on the new bindings. From what I
can see in the mono-1.2.5.1 code, the new bindings (like the old
bindings) just return the entire contents of the field. Maybe we
should make a feature request?
-- 
Arun Raghavan
(http://nemesis.accosted.net)
v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056
e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread Arun Raghavan
Updated patch attached -- some of the older code was not building.

Cheers,
Arun

On 02/10/2007, Arun Raghavan [EMAIL PROTECTED] wrote:
 On 02/10/2007, Kevin Kubasik [EMAIL PROTECTED] wrote:
  Very cool, and good to hear. If Arun could share a patch for his
  implementation, that would be awesome in terms of preventing wheel
  reinvention ;) If Arun is unable, or doesn't have the time to look
  into a hybrid solution, I wouldn't mind doing some investigative work,

 I've been completely swamped with work here in the first half of the
 semester, and I spent a little time getting the xesam-adaptor updated
 to the latest spec. Do let me know if you're taking this up, so
 there's no duplication of effort. The patch against r4013 is attached.

   I think the biggest decision comes when its time to determine what
  our cutoff is, (size wise). While there is a little extra complication
  introduced by a hybrid system, I don't see it being a major  issue to
  implement. My thought would just be to have a table in the
  TextCache.db which denotes if a uri is stored in db or on disk. The
  major concern is the cost of 2 sqlite queries per cache item.

 Might it not be easier to have a boolean field denoting whether the
 field is an on-disk URI or the blob itself? Or better, if this is
 possible, to just examine the first few bytes to see if they are some
 ASCII text (or !(the Zip magic bytes))

 Best,
 --
 Arun Raghavan
 (http://nemesis.accosted.net)
 v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056
 e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com
Index: beagled/FileSystemQueryable/FileSystemQueryable.cs
===
--- beagled/FileSystemQueryable/FileSystemQueryable.cs  (revision 4016)
+++ beagled/FileSystemQueryable/FileSystemQueryable.cs  (working copy)
@@ -1810,17 +1810,12 @@
// is stored in a property.
Uri uri = UriFu.EscapedStringToUri (hit 
[beagle:InternalUri]);
 
-   string path = TextCache.UserCache.LookupPathRaw (uri);
+   Stream text = TextCache.UserCache.LookupText(uri, 
hit.Uri.LocalPath);
 
-   if (path == null)
+   if (text == null)
return null;
 
-   // If this is self-cached, use the remapped Uri
-   if (path == TextCache.SELF_CACHE_TAG)
-   return SnippetFu.GetSnippetFromFile 
(query_terms, hit.Uri.LocalPath, full_text);
-
-   path = Path.Combine (TextCache.UserCache.TextCacheDir, 
path);
-   return SnippetFu.GetSnippetFromTextCache (query_terms, 
path, full_text);
+   return SnippetFu.GetSnippet(query_terms, new 
StreamReader(text), full_text);
}
 
override public void Start ()
Index: beagled/TextCache.cs
===
--- beagled/TextCache.cs(revision 4016)
+++ beagled/TextCache.cs(working copy)
@@ -37,6 +37,53 @@
 
 namespace Beagle.Daemon {
 
+   // We only have this class because GZipOutputStream doesn't let us
+   // retrieve the baseStream
+   public class TextCacheStream : GZipOutputStream {
+   private Stream stream;
+
+   public Stream BaseStream {
+   get { return stream; }
+   }
+
+   public TextCacheStream() : this(new MemoryStream())
+   {
+   }
+
+   public TextCacheStream(Stream stream) : base(stream)
+   {
+   this.stream = stream;
+   this.IsStreamOwner = false;
+   }
+   }
+
+   public class TextCacheWriter : StreamWriter {
+   private Uri uri;
+   private TextCache parent_cache;
+   private TextCacheStream tcStream;
+
+   public TextCacheWriter(TextCache cache, Uri uri, 
TextCacheStream tcStream) : base(tcStream)
+   {
+   parent_cache = cache;
+   this.uri = uri;
+   this.tcStream = tcStream;
+   }
+
+   override public void Close()
+   {
+   base.Close();
+
+   Stream stream = tcStream.BaseStream;
+
+   byte[] text = new byte[stream.Length];
+   stream.Seek(0, SeekOrigin.Begin);
+   stream.Read(text, 0, (int)stream.Length);
+
+   parent_cache.Insert(uri, text);
+   tcStream.BaseStream.Close();
+   }
+   }
+
// FIXME: This class isn't multithread safe!  This class does not
// ensure that different threads don't utilize a transaction started
// in a certain thread at the same time.  However, since all the
@@ -50,7 +97,7 

Re: GSoC Weekly Report

2007-10-02 Thread Daniel Naber
On Tuesday 02 October 2007 19:13, you wrote:

 Thinking quickly, one way to do this would be to add an option to
 query to specify the language.

That's a nice option, but the default should be to search all languages I 
think. People are used to just type a word without setting another option.

Regards
 Daniel

-- 
http://www.danielnaber.de
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-01 Thread Kevin Kubasik
On 8/19/07, Arun Raghavan [EMAIL PROTECTED] wrote:
 Hello All,
 This week I've been working on the new TextCache implementation that
 I'd mentioned the last time (replacing the bunch of files with an
 Sqlite db).

 Making an Sqlite db with just the uri and raw text caused an almost 3x
 increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in
 my test case). This despite the fact that the size of the raw text was
 only 7.9 MB. I need to figure out why this happens. In the mean time,
 I also implemented another version of this which stores (uri, gzipped
 text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly,
 this actually seems to work very well (the db for the test case
 mentioned shrunk down to 2.6 MB, which is just a little more than the
 actual size of the compressed data itself).
My first impression on this is that Sqlite is probably building an
index for the raw text data. where as the compressed data is simply
treated as a binary 'glob'. I'm not 100% sure of the table definitions
that your using, or exactly how much (in terms of Indexes) sqlite does
automatically, but that seems like the most likely culprit. As we
already have our own system for searching text ;) if you could find a
way to force sqlite to not index the table's raw text column, you
could probably get more sane numbers regarding the database size.
However, its possible, its just how sqlite handles text content, and
the gzipped text is the best way to go. The other thing to test is how
this is handled in far larger situations. Is it possible that the
first 1000 rows are very expensive, but when we scale to 5 rows,
we see only a minute increase in size?


 Performance numbers on a search which returns 1205 results are below.
 I basically ran the measurements twice -- once after flushing the
 inode, dentry and page cache, and another time taking advantage of the
 disk caches.

 Current TextCache:
 no-disk-cache: ~1m
 with-disk-cache: ~9s

 New TextCache (raw and gzipped versions had similar numbers):
 no-disk-cache: ~42s
 with-disk-cache: ~10s


Very cool/ interesting. One of the important cases to test here is
multiple successive queries. Think like deskbar as a user types
completion, how does such a system fair when it gets 15 or 20 queries
back to back. Does the compression difference factor in then?

 One very important factor remains to be seen -- memory usage. I am
 working on figuring out what the impact of the new code on memory
 usage is. Numbers should be available soon.

 On the Xesam front, I will be updating the code tomorrow,day-after to
 reflect the latest changes to the spec.

I know the Google SoC is over, and its completely ok if your too busy
to complete these tests, but if would be awesome if you could provide
a patch to the list so we can not only see exactly what you were
doing, but so that someone else might finish up your work and/or get
it merged in and ready for 0.3.0.


 --
 Arun Raghavan


-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-01 Thread Kevin Kubasik
A quick followup, some reading here:

http://www.sqlite.org/datatype3.html

provides some insight into how exactly sqlite3 stores values, I'm not
completely sure that such a loose typing system will greatly benefit
us when working with TEXT/STRING types, however, the gzipped blobs
might benefit from less disk usage thanks to being stored in a single
file, in addition, I know that incremental i/o is a possibility with
blobs in sqlite 3.4, which could potentially be utilized to optimize
work like this.

Anyways, please send a patch to the list if thats not too much to ask,
or just give us an update as to how things are going.

Cheers,
Kevin Kubasik

On 10/1/07, Kevin Kubasik [EMAIL PROTECTED] wrote:
 On 8/19/07, Arun Raghavan [EMAIL PROTECTED] wrote:
  Hello All,
  This week I've been working on the new TextCache implementation that
  I'd mentioned the last time (replacing the bunch of files with an
  Sqlite db).
 
  Making an Sqlite db with just the uri and raw text caused an almost 3x
  increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in
  my test case). This despite the fact that the size of the raw text was
  only 7.9 MB. I need to figure out why this happens. In the mean time,
  I also implemented another version of this which stores (uri, gzipped
  text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly,
  this actually seems to work very well (the db for the test case
  mentioned shrunk down to 2.6 MB, which is just a little more than the
  actual size of the compressed data itself).
 My first impression on this is that Sqlite is probably building an
 index for the raw text data. where as the compressed data is simply
 treated as a binary 'glob'. I'm not 100% sure of the table definitions
 that your using, or exactly how much (in terms of Indexes) sqlite does
 automatically, but that seems like the most likely culprit. As we
 already have our own system for searching text ;) if you could find a
 way to force sqlite to not index the table's raw text column, you
 could probably get more sane numbers regarding the database size.
 However, its possible, its just how sqlite handles text content, and
 the gzipped text is the best way to go. The other thing to test is how
 this is handled in far larger situations. Is it possible that the
 first 1000 rows are very expensive, but when we scale to 5 rows,
 we see only a minute increase in size?

 
  Performance numbers on a search which returns 1205 results are below.
  I basically ran the measurements twice -- once after flushing the
  inode, dentry and page cache, and another time taking advantage of the
  disk caches.
 
  Current TextCache:
  no-disk-cache: ~1m
  with-disk-cache: ~9s
 
  New TextCache (raw and gzipped versions had similar numbers):
  no-disk-cache: ~42s
  with-disk-cache: ~10s
 

 Very cool/ interesting. One of the important cases to test here is
 multiple successive queries. Think like deskbar as a user types
 completion, how does such a system fair when it gets 15 or 20 queries
 back to back. Does the compression difference factor in then?

  One very important factor remains to be seen -- memory usage. I am
  working on figuring out what the impact of the new code on memory
  usage is. Numbers should be available soon.
 
  On the Xesam front, I will be updating the code tomorrow,day-after to
  reflect the latest changes to the spec.

 I know the Google SoC is over, and its completely ok if your too busy
 to complete these tests, but if would be awesome if you could provide
 a patch to the list so we can not only see exactly what you were
 doing, but so that someone else might finish up your work and/or get
 it merged in and ready for 0.3.0.


  --
  Arun Raghavan


 --
 Cheers,
 Kevin Kubasik
 http://kubasik.net/blog



-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-08-19 Thread Debajyoti Bera
 Making an Sqlite db with just the uri and raw text caused an almost 3x
 increase in the text cache size (3.6 MB (on-disk) vs. almost 15MB in
 my test case). This despite the fact that the size of the raw text was
 only 7.9 MB. I need to figure out why this happens. In the mean time,
 I also implemented another version of this which stores (uri, gzipped
 text) pairs in the Sqlite db instead of (uri, raw text). Surprisingly,
 this actually seems to work very well (the db for the test case
 mentioned shrunk down to 2.6 MB, which is just a little more than the
 actual size of the compressed data itself).

 Current TextCache:
 no-disk-cache: ~1m
 with-disk-cache: ~9s

 New TextCache (raw and gzipped versions had similar numbers):
 no-disk-cache: ~42s
 with-disk-cache: ~10s

The numbers look pretty good. Size on disk is the main focus here. The disk 
cache will come into heavy play on a machine constantly serving queries. So 
even if that suffers a little bit (but only a little bit), I think its still 
ok if we gain in other places. The speedup with no-disk-cache is an added 
bonus.

Do the performance degrade when looking up small result sets ? In the current 
implementation, that will involve lesser disk seek whereas for the sqlite 
based approach, the I/O overhead will probably be similar.

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc weekly report (Browser Extension Rewrite)

2007-08-08 Thread Debajyoti Bera
Tao,
  I was testing the extension, when I noticed this (browser.dump enabled):

[beagle] [beaglPref.get beagle.bookmark.active] [Exception... Component 
returned failure code: 0x8000 (NS_ERROR_UNEXPECTED) 
[nsIPrefBranch.getBoolPref]  nsresult: 0x8000 (NS_ERROR_UNEXPECTED)  
location: JS frame :: chrome://newbeagle/content/utils.js :: anonymous :: 
line 53  data: no]

This was getting thrown on the terminal multiple times. Not quite sure what 
was triggering this.

(I didnt set any option explicitly in the preferences)

Also, in beagleoverlay.js:writeMetadata, uniddexed should be unindexed (typo).
Could you store the URLs as text and not keyword - people should be able to 
query part of the url too :) ?

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc weekly report (Browser Extension Rewrite)

2007-08-07 Thread Tao Fei
2007/8/7, Joe Shaw [EMAIL PROTECTED]:
 Hi,

 On 8/6/07, Tao Fei [EMAIL PROTECTED] wrote:
  2007/8/7, Joe Shaw [EMAIL PROTECTED]:
   I've been playing around with the new extension, and I'm seeing a
   little inconsistent behavior with it.  I wonder if it's related to me
   having the old Beagle extension installed as well (although I disabled
   that one).
 
  Yes. That's the problem.
  I use the same preference name beagle.enabled with the old extension.
  Fixed now. Using beagle.autoindex.active now, the tooltip is also updated.
  beagle.enabled is wrong word, as it doesn't affect on-demand index.

 Cool, I'll give it a test later today.

 We should keep in mind a migration path for the old extension.
 Ideally the new one will just be a drop-in replacement, and if we
 could migrate the basic settings (ie, enabled/disabled and a
 whitelist/blacklist) that would be ideal.
Oh. You can import the preference from old extension. ( just open the
preference window, and you can see the button).
May be I should  imported them silently  when the new extension is installed.
We'll may also want to use
 the same UUID so that upgrades are done cleanly, if there's no method
 for obsoleting other extensions.
The same UUID ? I guess we only need to modified the install.rdf to
change the UUID.

-- 
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc weekly report (Browser Extension Rewrite)

2007-08-06 Thread Joe Shaw
Hey,

I've been playing around with the new extension, and I'm seeing a
little inconsistent behavior with it.  I wonder if it's related to me
having the old Beagle extension installed as well (although I disabled
that one).

Whenever I open a site, I get the little dog icon with an X over it,
indicating that it's not indexing that page.  The page is not from
HTTPS, and when I open the preferences dialog, the Default Action
has Index selected.  If I click on the icon to toggle it, it gets
indexed fine.  But I'm not sure why it's not by default.  After I
toggle the icon, any subsequent page opens are indexed.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc weekly report (Browser Extension Rewrite)

2007-08-06 Thread Tao Fei
2007/8/7, Joe Shaw [EMAIL PROTECTED]:
 Hey,

 I've been playing around with the new extension, and I'm seeing a
 little inconsistent behavior with it.  I wonder if it's related to me
 having the old Beagle extension installed as well (although I disabled
 that one).
Yes. That's the problem.
I use the same preference name beagle.enabled with the old extension.
Fixed now. Using beagle.autoindex.active now, the tooltip is also updated.
beagle.enabled is wrong word, as it doesn't affect on-demand index.

 Whenever I open a site, I get the little dog icon with an X over it,
 indicating that it's not indexing that page.  The page is not from
 HTTPS, and when I open the preferences dialog, the Default Action
 has Index selected.  If I click on the icon to toggle it, it gets
 indexed fine.  But I'm not sure why it's not by default.  After I
 toggle the icon, any subsequent page opens are indexed.

 Joe



-- 
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc Weekly Report (Browser Extension Rewrite)

2007-07-20 Thread Joe Shaw
Hi,

On 7/14/07, Tao Fei [EMAIL PROTECTED] wrote:
 I've noticed that Epiphany can be written in C or in Python. The old
 extension is written in C. I'm wondering whether it is acceptable if I
 write the extension in python ?

It's a possibility, although I'm not crazy about adding a Python
dependency to Beagle (not libbeagle, which already has an optional
Python dep for the bindings).  It's probably not unreasonable to
assume that anyone with Epiphany installed will also have Python,
however.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


RE: GSoc Weekly Report (Browser Extension Rewrite)

2007-07-20 Thread Kevin Kubasik
Hey, a quick note on the subject, I made a haphazard attempt at this rewrite 
some time ago, and faced the same issue you have now. I think the deciding 
factor would be your personal experience with the languages. If you have never 
really worked with C, but have used python, I would think that a well designed 
and well written python plugin is much better than a haphazard 'My First C' 
program. The second concern/thought is that a lot of users will leave their 
browsers open for hours (if not days) at a time, I'm not 100% sure if this 
applies in the plugin context, but a GC system probably offers some safety net 
for memory use.

Just a quick $0.02,
Kevin Kubasik


-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Joe Shaw
Sent: Friday, July 20, 2007 10:49 AM
To: Tao Fei
Cc: dashboard-hackers@gnome.org
Subject: Re: GSoc Weekly Report (Browser Extension Rewrite)

Hi,

On 7/14/07, Tao Fei [EMAIL PROTECTED] wrote:
 I've noticed that Epiphany can be written in C or in Python. The old
 extension is written in C. I'm wondering whether it is acceptable if I
 write the extension in python ?

It's a possibility, although I'm not crazy about adding a Python
dependency to Beagle (not libbeagle, which already has an optional
Python dep for the bindings).  It's probably not unreasonable to
assume that anyone with Epiphany installed will also have Python,
however.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


smime.p7s
Description: S/MIME cryptographic signature
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc Weekly Report (Browser Extension Rewrite)

2007-07-11 Thread Tao Fei

Sorry for late.
I have just got back to home.
I got some network problem, and  failed to  get access to the network
until today,


There was a recent one opened against the old extension about
internationalization.  I think that's a pretty important task that
this one should address.  There is even a patch attached to that bug,
although I haven't looked at it closely.

Yes,I have notice that. (Debajyoti have cc-ed  this bug to me)
I'd like to say that  the new extension will be  translatable. I
have put all the UI string in .dtd file and all the javascript string
in .properties  file (expect some debug information).
And I will keep doing that.


Thanks.
--
Tao Fei (陶飞)
My Blog: blog.filia.cn
My Summer Of Code Blog: filiasoc.blogspot.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoc Weekly Report (Browser Extension Rewrite)

2007-07-10 Thread Joe Shaw
Hi,

On 7/6/07, Tao Fei [EMAIL PROTECTED] wrote:
 I did a little search in http://bugzilla.gnome.org/ , there are some
 bug reports for the extension.
 eg, Bug 317605:  http://bugzilla.gnome.org/show_bug.cgi?id=317605
 In fact , I use status bar label  to indicate  whether the page is
 indexed. And use the beagle icon to indicate wheather the beagle is
 enabled or disabled or in a error state.
 The icon is global. I think it partly fix the bug .

Yeah, I think this is a good idea.  I didn't like before that the icon
was overloaded for two questions: is this page indexed? and is the
extension enabled for this page?  Separating those concepts is a good
idea.

 What to do next:
 * to fix the bugs  in  bugzilla (or avoid producing them )

There was a recent one opened against the old extension about
internationalization.  I think that's a pretty important task that
this one should address.  There is even a patch attached to that bug,
although I haven't looked at it closely.

Thanks,
Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers