Re: GSoC Weekly Report

2007-10-02 Thread Kevin Kubasik
Very cool, and good to hear. If Arun could share a patch for his
implementation, that would be awesome in terms of preventing wheel
reinvention ;) If Arun is unable, or doesn't have the time to look
into a hybrid solution, I wouldn't mind doing some investigative work,
 I think the biggest decision comes when its time to determine what
our cutoff is, (size wise). While there is a little extra complication
introduced by a hybrid system, I don't see it being a major  issue to
implement. My thought would just be to have a table in the
TextCache.db which denotes if a uri is stored in db or on disk. The
major concern is the cost of 2 sqlite queries per cache item.

Just my thoughts on the subject. DBera: are you saying that you want
to just work/look into the language stemming, or both the language
stemming and the text cache? Depending on what you want to work on, I
can help out with this, if its something we really want to see in
0.3.0. Lemme know.

Cheers,
Kevin Kubasik

On 10/2/07, Debajyoti Bera [EMAIL PROTECTED] wrote:
  completely sure that such a loose typing system will greatly benefit
  us when working with TEXT/STRING types, however, the gzipped blobs
  might benefit from less disk usage thanks to being stored in a single
  file, in addition, I know that incremental i/o is a possibility with
  blobs in sqlite 3.4, which could potentially be utilized to optimize
  work like this.
 
  Anyways, please send a patch to the list if thats not too much to ask,
  or just give us an update as to how things are going.

 I and Arun had some discussion about this and we were trying to balance the
 performance and size issues. He already has the sqlite-idea implemented;
 however I would also like to see how a hybrid idea works i.e. store the huge
 number of extremely small files in sqlite and store the really large ones on
 the disk. Implementing this is tricky (*).

 - dBera

 (*) One of my recent efforts has been to add language detection support (based
 on a patch in bugzilla). This will enable us to use the right stemmers and
 analyzers depending on the language. The hard part is stealing some initial
 text for language detection and doing it in a transparent way. Incidentally,
 one implementation of the hybird approach mentioned above and the language
 detection crosses path. I am waiting for some free time to get going after
 them.

 --
 -
 Debajyoti Bera @ http://dtecht.blogspot.com
 beagle / KDE fan
 Mandriva / Inspiron-1100 user



-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread D Bera
 Just my thoughts on the subject. DBera: are you saying that you want
 to just work/look into the language stemming, or both the language
 stemming and the text cache? Depending on what you want to work on, I
 can help out with this, if its something we really want to see in
 0.3.0. Lemme know.

1. I definitely don't have the time, lest it would have been done by now :)
2. I will locate Arun's patch and send it out; its a good
implementation and can acts a reference.
3. The problem is less on the number of queries. It is more about
sending the data to textcache (which can either store it gzipped in
sqlite or gzipped on disk), and to the language determination class
and to lucene without (repeat:without) storing all the data in a huge
store/string in memory. I thought a cutoff size of disk_block_size
would be a good starting point, it will reduce external fragmentation
to a good degree since most textcache files are less than 1 block. So
the decision to store on disk or in sqlite can only come after we have
read, say 4KB of data. The language determination, I think, requires
1K of text. In our filter/lucene interface, lucene asks for data and
then filters go and extract little more data from the file and send it
back: this goes in loop till there is no more data to extract. There
is no storing of data in the memory! So to do the whole thing
correctly, as lucene asks for more data the filters return the data
and transparently someone in the middle decides whether to store the
data in sqlite or disk (and does so); furthermore, even before lucene
asks for data, about 1K of data is extracted from the file, language
detected and appropriate stemmer hooked and the data is kept around
till lucene asks for it. The obvious approach is by extracting all the
data in advance, storing it in memory, deciding where to store
textcache, deciding the language and then comfortably feeding lucene
from the stored data. Thats not desired.

I hope you also see where the connection between language
determination and text-cache comes in. Go for them if you or anyone
wants to. Just let others know so there is no duplication in effort.

N. Lets not target a release and cram features in :) Instead if you
want to work on something, work on it. If it is done and release-ready
by 0.3, it will be included. Otherwise there is always another
release. There is little sense if including lots of half-complete,
pooly implemented features just to make the release notes look yummy
:-) Of course I am restating the obvious. (*)

- dBera

(*) When I sent out a to-come feature list in one of my earlier
emails, I was more stressing the fact that testing is becoming very
important and difficult with all these different features and less on
the fact that Wow! Now we can do XXX too. Now I think I was misread.

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Reviving Semantic Relationships in Beagle

2007-10-02 Thread Kevin Kubasik
Some of you may remember Max and his 2006 GSoC project to implement a
separate metadata store for Beagle. Hours of hard work later, it
became apparent that it wasn't so much the storage of metadata that
was important (lucene stores Properties, or 'Fields' just fine) but
relationships between data.

The Beagle++ project has attempted to utilize some of these
relationships by building a RDF store and querying that, while such a
system does have its benifits, and may even have been a choice to
consider when Beagle was first written, at this point, Beagle is
locked into its Lucene based backend, and we don't want to give up our
lightning fast searches. However, a RDF graph, or map/hierarchy of
indexed entities has some appeal when we look at situations like
Archives, Mail Attachments, Downloads etc. which all utilize the idea
of 'Parent' or 'Children' sources. Lucene is not particularly well
suited to representing such relationships, and while Beagle has built
a system to handling such cases, it is far from perfect, or universal.

What I am proposing is a universal (as in backend-independent) rdf
graph of uri's. While graphing and storing all of Beagle's metadata in
RDF graphs not only makes querying more difficult, but results in data
duplication, and reworking a system which (for all intents and
purposes) is fine in its current state.

The new RDF map would be useless without the API elements to access
it, so I propose the following means of 'hooking up' a RDF store to
Beagle.

-New Query_Part which allows a rdf type query (raw) against the store.
-Wire into LuceneQueryDriver and LuceneIndexDriver to store new
relationships in RDF store and query them upon creation of a Hit.
-Add a more accessible API to Filter for Adding Parents/Children to
indexables. (I'm thinking add addParent(Uri) addChild(Uri) methods,
but its a first thought, the issue is most of the time, these
relationships are only visible on a higher level, not as each item is
filtered for indexing, but noticing that a document in my home
directory is the same as a attachment in my inbox, and linking the
too, a difficult use case to work with. )

It is also important to note that the RDF store is _only storing uniqe
URI's in a relationship graph_ like the following sketch.(uri1 is an
e-mail, uri2 is a contact and uri3 and uri4 are oo.org documents)

uri1
\
 |-uri3
 |-uri2
 | \
 |  |-uri4

 Both the contact that sent the e-mail and the attachment are
children, and the contact has sent 1 other document to us, hence uri2
has a child of that document.

While this seems like we are replicating much of the data in the
Lucene Fields, this is actually something completely different, we are
referencing an exact entity, not just a name, or subject. As a result
of this tree, not only can we adjust our scoring to account for
related items, but we can provide right-click options like 'See all
files by this author' etc. in a more intelligent mannor.

I'm interested to see what people think, and what (if any) experience
people have had with similar work.
-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: beagle r4011 - in trunk/beagle: beagled beagled/IndexingServiceQueryable search/Tiles

2007-10-02 Thread Kevin Kubasik
On 10/2/07, D Bera [EMAIL PROTECTED] wrote:
  Add the following context menu options:
  -Find By Author
  -Find Messages From Sender
  -Find Pages From Domain
 
  to the relivent tiles in beagle-search

 Some quick observations:
 - The propertykeyword mapping is specific to IndexingServiceQueryable,
 so it should be added to IndexingServiceQueryable.cs itself (see
 FileSystemQueryable/FileSystemQueryable.cs, just after the namespace
 declaration, to see how backends can define their own mapping)
Done, checked in
 - I am not sure starting a _new_ beagle-search is a good idea. It
 should search in that same one. I dont use that UI much (actually,
 never) so you might want to get some feedback about this.

I'm inclined to use this manor, as I don't like beagle-search losing
my old search, although its pretty trivial to use one or the other,
whatever the consensus on the list is will be my course of action.
 --
 -
 Debajyoti Bera @ http://dtecht.blogspot.com
 beagle / KDE fan
 Mandriva / Inspiron-1100 user



-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread Arun Raghavan
On 02/10/2007, Kevin Kubasik [EMAIL PROTECTED] wrote:
 Very cool, and good to hear. If Arun could share a patch for his
 implementation, that would be awesome in terms of preventing wheel
 reinvention ;) If Arun is unable, or doesn't have the time to look
 into a hybrid solution, I wouldn't mind doing some investigative work,

I've been completely swamped with work here in the first half of the
semester, and I spent a little time getting the xesam-adaptor updated
to the latest spec. Do let me know if you're taking this up, so
there's no duplication of effort. The patch against r4013 is attached.

  I think the biggest decision comes when its time to determine what
 our cutoff is, (size wise). While there is a little extra complication
 introduced by a hybrid system, I don't see it being a major  issue to
 implement. My thought would just be to have a table in the
 TextCache.db which denotes if a uri is stored in db or on disk. The
 major concern is the cost of 2 sqlite queries per cache item.

Might it not be easier to have a boolean field denoting whether the
field is an on-disk URI or the blob itself? Or better, if this is
possible, to just examine the first few bytes to see if they are some
ASCII text (or !(the Zip magic bytes))

Best,
-- 
Arun Raghavan
(http://nemesis.accosted.net)
v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056
e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com
Index: beagled/FileSystemQueryable/FileSystemQueryable.cs
===
--- beagled/FileSystemQueryable/FileSystemQueryable.cs  (revision 4013)
+++ beagled/FileSystemQueryable/FileSystemQueryable.cs  (working copy)
@@ -1810,17 +1810,12 @@
// is stored in a property.
Uri uri = UriFu.EscapedStringToUri (hit 
[beagle:InternalUri]);
 
-   string path = TextCache.UserCache.LookupPathRaw (uri);
+   Stream text = TextCache.UserCache.LookupText(uri, 
hit.Uri.LocalPath);
 
-   if (path == null)
+   if (text == null)
return null;
 
-   // If this is self-cached, use the remapped Uri
-   if (path == TextCache.SELF_CACHE_TAG)
-   return SnippetFu.GetSnippetFromFile 
(query_terms, hit.Uri.LocalPath, full_text);
-
-   path = Path.Combine (TextCache.UserCache.TextCacheDir, 
path);
-   return SnippetFu.GetSnippetFromTextCache (query_terms, 
path, full_text);
+   return SnippetFu.GetSnippet(query_terms, new 
StreamReader(text), full_text);
}
 
override public void Start ()
Index: beagled/TextCache.cs
===
--- beagled/TextCache.cs(revision 4013)
+++ beagled/TextCache.cs(working copy)
@@ -37,6 +37,53 @@
 
 namespace Beagle.Daemon {
 
+   // We only have this class because GZipOutputStream doesn't let us
+   // retrieve the baseStream
+   public class TextCacheStream : GZipOutputStream {
+   private Stream stream;
+
+   public Stream BaseStream {
+   get { return stream; }
+   }
+
+   public TextCacheStream() : this(new MemoryStream())
+   {
+   }
+
+   public TextCacheStream(Stream stream) : base(stream)
+   {
+   this.stream = stream;
+   this.IsStreamOwner = false;
+   }
+   }
+
+   public class TextCacheWriter : StreamWriter {
+   private Uri uri;
+   private TextCache parent_cache;
+   private TextCacheStream tcStream;
+
+   public TextCacheWriter(TextCache cache, Uri uri, 
TextCacheStream tcStream) : base(tcStream)
+   {
+   parent_cache = cache;
+   this.uri = uri;
+   this.tcStream = tcStream;
+   }
+
+   override public void Close()
+   {
+   base.Close();
+
+   Stream stream = tcStream.BaseStream;
+
+   byte[] text = new byte[stream.Length];
+   stream.Seek(0, SeekOrigin.Begin);
+   stream.Read(text, 0, (int)stream.Length);
+
+   parent_cache.Insert(uri, text);
+   tcStream.BaseStream.Close();
+   }
+   }
+
// FIXME: This class isn't multithread safe!  This class does not
// ensure that different threads don't utilize a transaction started
// in a certain thread at the same time.  However, since all the
@@ -50,7 +97,7 @@
 
static public bool Debug = false;
 
-   public const string SELF_CACHE_TAG = *self*;
+   private const string 

Re: GSoC Weekly Report

2007-10-02 Thread Arun Raghavan
On 02/10/2007, Kevin Kubasik [EMAIL PROTECTED] wrote:
 A quick followup, some reading here:

 http://www.sqlite.org/datatype3.html

 provides some insight into how exactly sqlite3 stores values, I'm not
 completely sure that such a loose typing system will greatly benefit
 us when working with TEXT/STRING types, however, the gzipped blobs
 might benefit from less disk usage thanks to being stored in a single
 file, in addition, I know that incremental i/o is a possibility with
 blobs in sqlite 3.4, which could potentially be utilized to optimize
 work like this.

If the bindings wrap a Stream around this, this would be ideal. There
doesn't seem to be much documentation on the new bindings. From what I
can see in the mono-1.2.5.1 code, the new bindings (like the old
bindings) just return the entire contents of the field. Maybe we
should make a feature request?
-- 
Arun Raghavan
(http://nemesis.accosted.net)
v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056
e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread Arun Raghavan
Updated patch attached -- some of the older code was not building.

Cheers,
Arun

On 02/10/2007, Arun Raghavan [EMAIL PROTECTED] wrote:
 On 02/10/2007, Kevin Kubasik [EMAIL PROTECTED] wrote:
  Very cool, and good to hear. If Arun could share a patch for his
  implementation, that would be awesome in terms of preventing wheel
  reinvention ;) If Arun is unable, or doesn't have the time to look
  into a hybrid solution, I wouldn't mind doing some investigative work,

 I've been completely swamped with work here in the first half of the
 semester, and I spent a little time getting the xesam-adaptor updated
 to the latest spec. Do let me know if you're taking this up, so
 there's no duplication of effort. The patch against r4013 is attached.

   I think the biggest decision comes when its time to determine what
  our cutoff is, (size wise). While there is a little extra complication
  introduced by a hybrid system, I don't see it being a major  issue to
  implement. My thought would just be to have a table in the
  TextCache.db which denotes if a uri is stored in db or on disk. The
  major concern is the cost of 2 sqlite queries per cache item.

 Might it not be easier to have a boolean field denoting whether the
 field is an on-disk URI or the blob itself? Or better, if this is
 possible, to just examine the first few bytes to see if they are some
 ASCII text (or !(the Zip magic bytes))

 Best,
 --
 Arun Raghavan
 (http://nemesis.accosted.net)
 v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056
 e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com
Index: beagled/FileSystemQueryable/FileSystemQueryable.cs
===
--- beagled/FileSystemQueryable/FileSystemQueryable.cs  (revision 4016)
+++ beagled/FileSystemQueryable/FileSystemQueryable.cs  (working copy)
@@ -1810,17 +1810,12 @@
// is stored in a property.
Uri uri = UriFu.EscapedStringToUri (hit 
[beagle:InternalUri]);
 
-   string path = TextCache.UserCache.LookupPathRaw (uri);
+   Stream text = TextCache.UserCache.LookupText(uri, 
hit.Uri.LocalPath);
 
-   if (path == null)
+   if (text == null)
return null;
 
-   // If this is self-cached, use the remapped Uri
-   if (path == TextCache.SELF_CACHE_TAG)
-   return SnippetFu.GetSnippetFromFile 
(query_terms, hit.Uri.LocalPath, full_text);
-
-   path = Path.Combine (TextCache.UserCache.TextCacheDir, 
path);
-   return SnippetFu.GetSnippetFromTextCache (query_terms, 
path, full_text);
+   return SnippetFu.GetSnippet(query_terms, new 
StreamReader(text), full_text);
}
 
override public void Start ()
Index: beagled/TextCache.cs
===
--- beagled/TextCache.cs(revision 4016)
+++ beagled/TextCache.cs(working copy)
@@ -37,6 +37,53 @@
 
 namespace Beagle.Daemon {
 
+   // We only have this class because GZipOutputStream doesn't let us
+   // retrieve the baseStream
+   public class TextCacheStream : GZipOutputStream {
+   private Stream stream;
+
+   public Stream BaseStream {
+   get { return stream; }
+   }
+
+   public TextCacheStream() : this(new MemoryStream())
+   {
+   }
+
+   public TextCacheStream(Stream stream) : base(stream)
+   {
+   this.stream = stream;
+   this.IsStreamOwner = false;
+   }
+   }
+
+   public class TextCacheWriter : StreamWriter {
+   private Uri uri;
+   private TextCache parent_cache;
+   private TextCacheStream tcStream;
+
+   public TextCacheWriter(TextCache cache, Uri uri, 
TextCacheStream tcStream) : base(tcStream)
+   {
+   parent_cache = cache;
+   this.uri = uri;
+   this.tcStream = tcStream;
+   }
+
+   override public void Close()
+   {
+   base.Close();
+
+   Stream stream = tcStream.BaseStream;
+
+   byte[] text = new byte[stream.Length];
+   stream.Seek(0, SeekOrigin.Begin);
+   stream.Read(text, 0, (int)stream.Length);
+
+   parent_cache.Insert(uri, text);
+   tcStream.BaseStream.Close();
+   }
+   }
+
// FIXME: This class isn't multithread safe!  This class does not
// ensure that different threads don't utilize a transaction started
// in a certain thread at the same time.  However, since all the
@@ -50,7 +97,7 

Webinterface for beagle search (and more...)

2007-10-02 Thread Debajyoti Bera
Hi Searchers,
Some of you are aware that a web interface for beagle is being worked 
on. I 
blogged about it in planetbeagle sometime back. After that, a fearless 
Nirbheek Chauhan continued hacking on it and managed to change

http://bp0.blogger.com/_Dl_EHp-s13Q/RuSVUVFtUNI/AAM/SShdc8dzfTs/s1600-h/beagle-web-interface.jpg.png

to

http://cs-people.bu.edu/dbera/blogdata/beagle_webui-1.png

I believe its now at a stage that some feedback would be good. 99.9% of it is 
javascript+xslt, so I(we) am mostly looking forward to UI issues. Its getting 
more makeup and functionality as I write this email. For the known 
things_to_do, check
http://svn.gnome.org/viewcvs/beagle/trunk/beagle/beagled/webinterface/TODO?view=markup

To get it running, you need the svn trunk. Get it, build it. Then (*) cd to 
beagle/beagled and start beagled as ./beagled  Yes, beagled needs to be 
started from beagle/beagled directory. Point your browser (oops... Firefox 
browser) to http://127.0.0.1:4000/ (**) and enjoy your stay.

Please, some feedback is necessary. At least say you hate e.g. the stupid way 
the last modified is shown.

- dBera

(*) The need to start beagled from beagled directory is a temporary one.
(**) For security minded people out there, the server actually opens port 4000 
to all! So any machine would be able to search (this is again temporary and 
would be configurable in near future). If you dont like to have the open port
during testing, replace in beagled/Server.cs: 
http://*:{0}/
by
http://127.0.0.1:{0}/

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: Webinterface for beagle search (and more...)

2007-10-02 Thread Nirbheek Chauhan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dammit, forgot to attach the patch, *rolls eyes*

- --
~Nirbheek Chauhan, he who has realised that today is not his day.
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.7 (GNU/Linux)
Comment: http://firegpg.tuxfamily.org

iD8DBQFHAnjOb1z91vbKYbYRAleZAJ9goi45ZBfUCZb/216E02a2jYCJ3QCcCGGG
oWPNT9AZqHrLzNz1Svy1Smw=
=ian5
-END PGP SIGNATURE-


enable_webbeagle_beagle.patch
Description: Binary data
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: Webinterface for beagle search (and more...)

2007-10-02 Thread D Bera
 The backend must be enabled by making

Ahh... right. You need to have the network service enabled (turned off
by default):
$ beagle-config networking ServiceEnabled
(it toggles the flag, and prints the current behaviour after toggling)

 Then, when you call beagled from inside beagle/beagled, call it with
 `beagled --backend +NetworkServices`

Yes, this too. (By default, the NetworkServices backend is turned on,
so you wont need it normally).

- dBera

-- 
-
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: GSoC Weekly Report

2007-10-02 Thread Daniel Naber
On Tuesday 02 October 2007 19:13, you wrote:

 Thinking quickly, one way to do this would be to add an option to
 query to specify the language.

That's a nice option, but the default should be to search all languages I 
think. People are used to just type a word without setting another option.

Regards
 Daniel

-- 
http://www.danielnaber.de
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: ITagProvider

2007-10-02 Thread Kevin Kubasik
On 10/2/07, Joe Shaw [EMAIL PROTECTED] wrote:
 Hi,

 Sorry for the delay.

 On 9/28/07, Kevin Kubasik [EMAIL PROTECTED] wrote:
  Downside is there have been attempts at a universal
  tagging library, what i would really love (in an ideal world) is if we
  crafted events through the indexservice, meaning that while we could
  offer a way to work against the tags from the BeagleClient API, you
  wouldn't have to implement it. However, for tag querying/listing, we
  would probably require a native API.

 Is the idea here to write the integration in Nautilus which would call
 the Beagle APIs to add and set tags?

 If so, I don't think this is a good solution for a couple of reasons:

 (1) It's Beagle specific, and with Tracker out there and a fairly
 divided community, a single implementation will never get all the
 buy-in it needs at this point.

Agreed, tracker could technically be the tagging backend/provider.
 (2) Tagging really has nothing to do with desktop search and indexing.
  Tags should be indexed by the indexer and made available through
 search, but fundamentally they're no more related than metadata in
 MP3s, JPEGs, emails, etc.

I agree that they _shouldn't_ be treated differently, however, the
inherent complexity
 (3) The amount of code you'd actually have to write to do this as a
 totally separate library in C isn't that much more work.  You'll be
 able to integrate with D-Bus, have the potential to get community
 buy-in for the library, and we can still use it in Beagle.

  The specifics of how much of this were going to make beagle
  responsible for (is beagle almost like a 'tagging adapter'? do we want
  to provide complete bi-directional support? or do we encourage people
  to work through the provider that we are using as a backend?)

 These problems all go away by implementing it as a separate,
 standalone library.  Beagle's UI simply uses the library widgets,
 talks to the library APIs, and gets notification (and reindexes
 updated tags) via D-Bus.  The only extra benefit something like Beagle
 gives you is the ability to push file system events back into the tag
 library, and Beagle needn't do that -- any file system monitoring
 system could provide that.
Agreed.

  I would just prefer that instead of Beagle trying to
  provide a robust tagging api, we just integrate it into our search
  intelligently, and treat it as a property.

 I agree wholeheartedly with this, and it's the reason why I wrote the
 Nautilus metadata backend in the trunk.  A tagging library backend
 could fit pretty cleanly into this mold.
There were 2 issues I found with this model, they could be just a
matter of implementation.

1) We are still bound to a single backend system, intelligently
handling universal desktop tagging would be quite difficult.

2) Data replication, as well as sync/performance issues with users who
actually utilize tags (think thousands of tagged files).

Now time and energy could optimize these scenarios (I think). I'm
working on writing some other metadata backends so I get a better feel
for the system.

  Well, the problem here is, I really don't want us to be tied to one
  index, or one backend. I would think (and I could be wrong here) that
  once we have merged the results from a backend, we could add any Uri's
  that were tagged with one of the query words. The key point here is to
  have a universal tagging store that transcends our backend system.

 Take a look at the Nautilus metadata backend.  This is basically what
 it does.  It doesn't have an indexing backing it, it simply sets
 additional properties on documents in existing backends.

   Or, if you're feeling particularly adventurous, rewrite the FSQ.
   That's the biggest consumer of memory at this point and has problems
   like being unable to search by parent directories.  I've described the
   issues with it in more detail in a previous email, I believe. :)
 
  I seem to remember something about that. Its probably overkill right
  now, with the slew of new features, but I am curious about it from a
  design standpoint.

 Maybe, but I consider it the #1 problem with Beagle right now.  I
 couldn't find my previous email about it, so I'll type it up soon.

 Joe



-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: beagle r4011 - in trunk/beagle: beagled beagled/IndexingServiceQueryable search/Tiles

2007-10-02 Thread Joe Shaw
Hi,

On 10/2/07, Kevin Kubasik [EMAIL PROTECTED] wrote:
 On 10/2/07, D Bera [EMAIL PROTECTED] wrote:
  - I am not sure starting a _new_ beagle-search is a good idea. It
  should search in that same one. I dont use that UI much (actually,
  never) so you might want to get some feedback about this.

 I'm inclined to use this manor, as I don't like beagle-search losing
 my old search, although its pretty trivial to use one or the other,
 whatever the consensus on the list is will be my course of action.

It's worth investigating how hard it would be to have beagle-search
open multiple search windows per instance.  Right now there's no other
way to do it, but firing up another instance of beagle-search is
gross.

Joe
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: beagle r4011 - in trunk/beagle: beagled beagled/IndexingServiceQueryable search/Tiles

2007-10-02 Thread Kevin Kubasik
Hmmm,  I agree that completely new instances of beagle-search aren't
ideal, but they are quite light. I think that having multiple
instances of beagle-search really isn't bad, no doubt a unified system
would be better, but I really don't know where to start. If people are
really against the current system (launching new beagle-search
instances) I can disable it until we have a better solution, I just
found the 'find mail from' workflow so common for me that it was
almost imperative that it be available.

I await the general opinion. But I tend to find that even with big
queries, its hard to get beagle-search over a meg of real memory.


On 10/2/07, Joe Shaw [EMAIL PROTECTED] wrote:
 Hi,

 On 10/2/07, Kevin Kubasik [EMAIL PROTECTED] wrote:
  On 10/2/07, D Bera [EMAIL PROTECTED] wrote:
   - I am not sure starting a _new_ beagle-search is a good idea. It
   should search in that same one. I dont use that UI much (actually,
   never) so you might want to get some feedback about this.
 
  I'm inclined to use this manor, as I don't like beagle-search losing
  my old search, although its pretty trivial to use one or the other,
  whatever the consensus on the list is will be my course of action.

 It's worth investigating how hard it would be to have beagle-search
 open multiple search windows per instance.  Right now there's no other
 way to do it, but firing up another instance of beagle-search is
 gross.

 Joe



-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: ITagProvider

2007-10-02 Thread Kevin Kubasik
On 10/2/07, Joe Shaw [EMAIL PROTECTED] wrote:
 Hi,

 On 10/2/07, Kevin Kubasik [EMAIL PROTECTED] wrote:
  Agreed, tracker could technically be the tagging backend/provider.

 Sure.  I have pondered writing a Tracker backend for Beagle in the past.

   (2) Tagging really has nothing to do with desktop search and indexing.
Tags should be indexed by the indexer and made available through
   search, but fundamentally they're no more related than metadata in
   MP3s, JPEGs, emails, etc.
  
  I agree that they _shouldn't_ be treated differently, however, the
  inherent complexity

 What's the inherent complexity?

Sorry, that sentence totally didn't finish, I meant to say the
inherent complexity of querying across multiple backends and merging
the results.

   I agree wholeheartedly with this, and it's the reason why I wrote the
   Nautilus metadata backend in the trunk.  A tagging library backend
   could fit pretty cleanly into this mold.
  There were 2 issues I found with this model, they could be just a
  matter of implementation.
 
  1) We are still bound to a single backend system, intelligently
  handling universal desktop tagging would be quite difficult.

 I'm not sure what the context of backend here is.  I think a desktop
 library would handle more than just files -- it'd be URI based like
 Beagle -- so it could handle emails, web pages, etc.

Agreed, thats not the issue, its on our side, intelligently merging
mutiple results from/for the same Uri.
 If you mean things like pulling from del.icio.us, then you'd just
 create a separate Beagle backend.  One for the local library and one
 for del.icio.us.  With all the focus GNOME is giving to the Online
 Desktop metaphor, there's no reason why the local database and a
 remote database like del.icio.us couldn't be sync'd independent of
 Beagle.

Yeah.. I guess the whole online paradigm works really well at making this 'ok'.
  2) Data replication, as well as sync/performance issues with users who
  actually utilize tags (think thousands of tagged files).

 What's the concern specifically here?  The database will have to be
 timestamped somehow to make offline change notification reasonably
 performant.

Its that most tagging databases are databases, without said timestamp,
we end up throwing thousands of changes. In a perfect world every
change to a tagging database would have a timestamp, but I think that
in most cases we won't be nearly that lucky.

Take the frontrunner (leaftag) its a sqlite database without
timestamps, short of copying the db and comparing it every time its
modified, I don't see a sane way to notice and update just our
changes. In this type of case, it seems like we would be better off
just querying leaftag directly, and then processing its results
internally.

or I could be a fool, and this has all been solved already, I'm not really sure.

-Kevin
 Joe



-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers


Re: ITagProvider

2007-10-02 Thread Andrew Ruthven
On Tue, 2007-10-02 at 17:30 -0400, Kevin Kubasik wrote:
 Take the frontrunner (leaftag) its a sqlite database without
 timestamps, short of copying the db and comparing it every time its
 modified, I don't see a sane way to notice and update just our
 changes. In this type of case, it seems like we would be better off
 just querying leaftag directly, and then processing its results
 internally.

Leaftag needs a bit more care and attention before it is really usable
in my opinion.  So I think we can probably fix that issue easily enough
by extending the DB schema.

Any tagging system that is used should be multi-user as well.

Cheers!
 
-- 
Andrew Ruthven, Wellington, New Zealand
At home: [EMAIL PROTECTED]   |  This space intentionally
 |left blank.


signature.asc
Description: This is a digitally signed message part
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers