Re: GSoC Weekly Report
Very cool, and good to hear. If Arun could share a patch for his implementation, that would be awesome in terms of preventing wheel reinvention ;) If Arun is unable, or doesn't have the time to look into a hybrid solution, I wouldn't mind doing some investigative work, I think the biggest decision comes when its time to determine what our cutoff is, (size wise). While there is a little extra complication introduced by a hybrid system, I don't see it being a major issue to implement. My thought would just be to have a table in the TextCache.db which denotes if a uri is stored in db or on disk. The major concern is the cost of 2 sqlite queries per cache item. Just my thoughts on the subject. DBera: are you saying that you want to just work/look into the language stemming, or both the language stemming and the text cache? Depending on what you want to work on, I can help out with this, if its something we really want to see in 0.3.0. Lemme know. Cheers, Kevin Kubasik On 10/2/07, Debajyoti Bera [EMAIL PROTECTED] wrote: completely sure that such a loose typing system will greatly benefit us when working with TEXT/STRING types, however, the gzipped blobs might benefit from less disk usage thanks to being stored in a single file, in addition, I know that incremental i/o is a possibility with blobs in sqlite 3.4, which could potentially be utilized to optimize work like this. Anyways, please send a patch to the list if thats not too much to ask, or just give us an update as to how things are going. I and Arun had some discussion about this and we were trying to balance the performance and size issues. He already has the sqlite-idea implemented; however I would also like to see how a hybrid idea works i.e. store the huge number of extremely small files in sqlite and store the really large ones on the disk. Implementing this is tricky (*). - dBera (*) One of my recent efforts has been to add language detection support (based on a patch in bugzilla). This will enable us to use the right stemmers and analyzers depending on the language. The hard part is stealing some initial text for language detection and doing it in a transparent way. Incidentally, one implementation of the hybird approach mentioned above and the language detection crosses path. I am waiting for some free time to get going after them. -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
Just my thoughts on the subject. DBera: are you saying that you want to just work/look into the language stemming, or both the language stemming and the text cache? Depending on what you want to work on, I can help out with this, if its something we really want to see in 0.3.0. Lemme know. 1. I definitely don't have the time, lest it would have been done by now :) 2. I will locate Arun's patch and send it out; its a good implementation and can acts a reference. 3. The problem is less on the number of queries. It is more about sending the data to textcache (which can either store it gzipped in sqlite or gzipped on disk), and to the language determination class and to lucene without (repeat:without) storing all the data in a huge store/string in memory. I thought a cutoff size of disk_block_size would be a good starting point, it will reduce external fragmentation to a good degree since most textcache files are less than 1 block. So the decision to store on disk or in sqlite can only come after we have read, say 4KB of data. The language determination, I think, requires 1K of text. In our filter/lucene interface, lucene asks for data and then filters go and extract little more data from the file and send it back: this goes in loop till there is no more data to extract. There is no storing of data in the memory! So to do the whole thing correctly, as lucene asks for more data the filters return the data and transparently someone in the middle decides whether to store the data in sqlite or disk (and does so); furthermore, even before lucene asks for data, about 1K of data is extracted from the file, language detected and appropriate stemmer hooked and the data is kept around till lucene asks for it. The obvious approach is by extracting all the data in advance, storing it in memory, deciding where to store textcache, deciding the language and then comfortably feeding lucene from the stored data. Thats not desired. I hope you also see where the connection between language determination and text-cache comes in. Go for them if you or anyone wants to. Just let others know so there is no duplication in effort. N. Lets not target a release and cram features in :) Instead if you want to work on something, work on it. If it is done and release-ready by 0.3, it will be included. Otherwise there is always another release. There is little sense if including lots of half-complete, pooly implemented features just to make the release notes look yummy :-) Of course I am restating the obvious. (*) - dBera (*) When I sent out a to-come feature list in one of my earlier emails, I was more stressing the fact that testing is becoming very important and difficult with all these different features and less on the fact that Wow! Now we can do XXX too. Now I think I was misread. -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Reviving Semantic Relationships in Beagle
Some of you may remember Max and his 2006 GSoC project to implement a separate metadata store for Beagle. Hours of hard work later, it became apparent that it wasn't so much the storage of metadata that was important (lucene stores Properties, or 'Fields' just fine) but relationships between data. The Beagle++ project has attempted to utilize some of these relationships by building a RDF store and querying that, while such a system does have its benifits, and may even have been a choice to consider when Beagle was first written, at this point, Beagle is locked into its Lucene based backend, and we don't want to give up our lightning fast searches. However, a RDF graph, or map/hierarchy of indexed entities has some appeal when we look at situations like Archives, Mail Attachments, Downloads etc. which all utilize the idea of 'Parent' or 'Children' sources. Lucene is not particularly well suited to representing such relationships, and while Beagle has built a system to handling such cases, it is far from perfect, or universal. What I am proposing is a universal (as in backend-independent) rdf graph of uri's. While graphing and storing all of Beagle's metadata in RDF graphs not only makes querying more difficult, but results in data duplication, and reworking a system which (for all intents and purposes) is fine in its current state. The new RDF map would be useless without the API elements to access it, so I propose the following means of 'hooking up' a RDF store to Beagle. -New Query_Part which allows a rdf type query (raw) against the store. -Wire into LuceneQueryDriver and LuceneIndexDriver to store new relationships in RDF store and query them upon creation of a Hit. -Add a more accessible API to Filter for Adding Parents/Children to indexables. (I'm thinking add addParent(Uri) addChild(Uri) methods, but its a first thought, the issue is most of the time, these relationships are only visible on a higher level, not as each item is filtered for indexing, but noticing that a document in my home directory is the same as a attachment in my inbox, and linking the too, a difficult use case to work with. ) It is also important to note that the RDF store is _only storing uniqe URI's in a relationship graph_ like the following sketch.(uri1 is an e-mail, uri2 is a contact and uri3 and uri4 are oo.org documents) uri1 \ |-uri3 |-uri2 | \ | |-uri4 Both the contact that sent the e-mail and the attachment are children, and the contact has sent 1 other document to us, hence uri2 has a child of that document. While this seems like we are replicating much of the data in the Lucene Fields, this is actually something completely different, we are referencing an exact entity, not just a name, or subject. As a result of this tree, not only can we adjust our scoring to account for related items, but we can provide right-click options like 'See all files by this author' etc. in a more intelligent mannor. I'm interested to see what people think, and what (if any) experience people have had with similar work. -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: beagle r4011 - in trunk/beagle: beagled beagled/IndexingServiceQueryable search/Tiles
On 10/2/07, D Bera [EMAIL PROTECTED] wrote: Add the following context menu options: -Find By Author -Find Messages From Sender -Find Pages From Domain to the relivent tiles in beagle-search Some quick observations: - The propertykeyword mapping is specific to IndexingServiceQueryable, so it should be added to IndexingServiceQueryable.cs itself (see FileSystemQueryable/FileSystemQueryable.cs, just after the namespace declaration, to see how backends can define their own mapping) Done, checked in - I am not sure starting a _new_ beagle-search is a good idea. It should search in that same one. I dont use that UI much (actually, never) so you might want to get some feedback about this. I'm inclined to use this manor, as I don't like beagle-search losing my old search, although its pretty trivial to use one or the other, whatever the consensus on the list is will be my course of action. -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
On 02/10/2007, Kevin Kubasik [EMAIL PROTECTED] wrote: Very cool, and good to hear. If Arun could share a patch for his implementation, that would be awesome in terms of preventing wheel reinvention ;) If Arun is unable, or doesn't have the time to look into a hybrid solution, I wouldn't mind doing some investigative work, I've been completely swamped with work here in the first half of the semester, and I spent a little time getting the xesam-adaptor updated to the latest spec. Do let me know if you're taking this up, so there's no duplication of effort. The patch against r4013 is attached. I think the biggest decision comes when its time to determine what our cutoff is, (size wise). While there is a little extra complication introduced by a hybrid system, I don't see it being a major issue to implement. My thought would just be to have a table in the TextCache.db which denotes if a uri is stored in db or on disk. The major concern is the cost of 2 sqlite queries per cache item. Might it not be easier to have a boolean field denoting whether the field is an on-disk URI or the blob itself? Or better, if this is possible, to just examine the first few bytes to see if they are some ASCII text (or !(the Zip magic bytes)) Best, -- Arun Raghavan (http://nemesis.accosted.net) v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056 e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com Index: beagled/FileSystemQueryable/FileSystemQueryable.cs === --- beagled/FileSystemQueryable/FileSystemQueryable.cs (revision 4013) +++ beagled/FileSystemQueryable/FileSystemQueryable.cs (working copy) @@ -1810,17 +1810,12 @@ // is stored in a property. Uri uri = UriFu.EscapedStringToUri (hit [beagle:InternalUri]); - string path = TextCache.UserCache.LookupPathRaw (uri); + Stream text = TextCache.UserCache.LookupText(uri, hit.Uri.LocalPath); - if (path == null) + if (text == null) return null; - // If this is self-cached, use the remapped Uri - if (path == TextCache.SELF_CACHE_TAG) - return SnippetFu.GetSnippetFromFile (query_terms, hit.Uri.LocalPath, full_text); - - path = Path.Combine (TextCache.UserCache.TextCacheDir, path); - return SnippetFu.GetSnippetFromTextCache (query_terms, path, full_text); + return SnippetFu.GetSnippet(query_terms, new StreamReader(text), full_text); } override public void Start () Index: beagled/TextCache.cs === --- beagled/TextCache.cs(revision 4013) +++ beagled/TextCache.cs(working copy) @@ -37,6 +37,53 @@ namespace Beagle.Daemon { + // We only have this class because GZipOutputStream doesn't let us + // retrieve the baseStream + public class TextCacheStream : GZipOutputStream { + private Stream stream; + + public Stream BaseStream { + get { return stream; } + } + + public TextCacheStream() : this(new MemoryStream()) + { + } + + public TextCacheStream(Stream stream) : base(stream) + { + this.stream = stream; + this.IsStreamOwner = false; + } + } + + public class TextCacheWriter : StreamWriter { + private Uri uri; + private TextCache parent_cache; + private TextCacheStream tcStream; + + public TextCacheWriter(TextCache cache, Uri uri, TextCacheStream tcStream) : base(tcStream) + { + parent_cache = cache; + this.uri = uri; + this.tcStream = tcStream; + } + + override public void Close() + { + base.Close(); + + Stream stream = tcStream.BaseStream; + + byte[] text = new byte[stream.Length]; + stream.Seek(0, SeekOrigin.Begin); + stream.Read(text, 0, (int)stream.Length); + + parent_cache.Insert(uri, text); + tcStream.BaseStream.Close(); + } + } + // FIXME: This class isn't multithread safe! This class does not // ensure that different threads don't utilize a transaction started // in a certain thread at the same time. However, since all the @@ -50,7 +97,7 @@ static public bool Debug = false; - public const string SELF_CACHE_TAG = *self*; + private const string
Re: GSoC Weekly Report
On 02/10/2007, Kevin Kubasik [EMAIL PROTECTED] wrote: A quick followup, some reading here: http://www.sqlite.org/datatype3.html provides some insight into how exactly sqlite3 stores values, I'm not completely sure that such a loose typing system will greatly benefit us when working with TEXT/STRING types, however, the gzipped blobs might benefit from less disk usage thanks to being stored in a single file, in addition, I know that incremental i/o is a possibility with blobs in sqlite 3.4, which could potentially be utilized to optimize work like this. If the bindings wrap a Stream around this, this would be ideal. There doesn't seem to be much documentation on the new bindings. From what I can see in the mono-1.2.5.1 code, the new bindings (like the old bindings) just return the entire contents of the field. Maybe we should make a feature request? -- Arun Raghavan (http://nemesis.accosted.net) v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056 e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
Updated patch attached -- some of the older code was not building. Cheers, Arun On 02/10/2007, Arun Raghavan [EMAIL PROTECTED] wrote: On 02/10/2007, Kevin Kubasik [EMAIL PROTECTED] wrote: Very cool, and good to hear. If Arun could share a patch for his implementation, that would be awesome in terms of preventing wheel reinvention ;) If Arun is unable, or doesn't have the time to look into a hybrid solution, I wouldn't mind doing some investigative work, I've been completely swamped with work here in the first half of the semester, and I spent a little time getting the xesam-adaptor updated to the latest spec. Do let me know if you're taking this up, so there's no duplication of effort. The patch against r4013 is attached. I think the biggest decision comes when its time to determine what our cutoff is, (size wise). While there is a little extra complication introduced by a hybrid system, I don't see it being a major issue to implement. My thought would just be to have a table in the TextCache.db which denotes if a uri is stored in db or on disk. The major concern is the cost of 2 sqlite queries per cache item. Might it not be easier to have a boolean field denoting whether the field is an on-disk URI or the blob itself? Or better, if this is possible, to just examine the first few bytes to see if they are some ASCII text (or !(the Zip magic bytes)) Best, -- Arun Raghavan (http://nemesis.accosted.net) v2sw5Chw4+5ln4pr6$OFck2ma4+9u8w3+1!m?l7+9GSCKi056 e6+9i4b8/9HTAen4+5g4/8APa2Xs8r1/2p5-8 hackerkey.com Index: beagled/FileSystemQueryable/FileSystemQueryable.cs === --- beagled/FileSystemQueryable/FileSystemQueryable.cs (revision 4016) +++ beagled/FileSystemQueryable/FileSystemQueryable.cs (working copy) @@ -1810,17 +1810,12 @@ // is stored in a property. Uri uri = UriFu.EscapedStringToUri (hit [beagle:InternalUri]); - string path = TextCache.UserCache.LookupPathRaw (uri); + Stream text = TextCache.UserCache.LookupText(uri, hit.Uri.LocalPath); - if (path == null) + if (text == null) return null; - // If this is self-cached, use the remapped Uri - if (path == TextCache.SELF_CACHE_TAG) - return SnippetFu.GetSnippetFromFile (query_terms, hit.Uri.LocalPath, full_text); - - path = Path.Combine (TextCache.UserCache.TextCacheDir, path); - return SnippetFu.GetSnippetFromTextCache (query_terms, path, full_text); + return SnippetFu.GetSnippet(query_terms, new StreamReader(text), full_text); } override public void Start () Index: beagled/TextCache.cs === --- beagled/TextCache.cs(revision 4016) +++ beagled/TextCache.cs(working copy) @@ -37,6 +37,53 @@ namespace Beagle.Daemon { + // We only have this class because GZipOutputStream doesn't let us + // retrieve the baseStream + public class TextCacheStream : GZipOutputStream { + private Stream stream; + + public Stream BaseStream { + get { return stream; } + } + + public TextCacheStream() : this(new MemoryStream()) + { + } + + public TextCacheStream(Stream stream) : base(stream) + { + this.stream = stream; + this.IsStreamOwner = false; + } + } + + public class TextCacheWriter : StreamWriter { + private Uri uri; + private TextCache parent_cache; + private TextCacheStream tcStream; + + public TextCacheWriter(TextCache cache, Uri uri, TextCacheStream tcStream) : base(tcStream) + { + parent_cache = cache; + this.uri = uri; + this.tcStream = tcStream; + } + + override public void Close() + { + base.Close(); + + Stream stream = tcStream.BaseStream; + + byte[] text = new byte[stream.Length]; + stream.Seek(0, SeekOrigin.Begin); + stream.Read(text, 0, (int)stream.Length); + + parent_cache.Insert(uri, text); + tcStream.BaseStream.Close(); + } + } + // FIXME: This class isn't multithread safe! This class does not // ensure that different threads don't utilize a transaction started // in a certain thread at the same time. However, since all the @@ -50,7 +97,7
Webinterface for beagle search (and more...)
Hi Searchers, Some of you are aware that a web interface for beagle is being worked on. I blogged about it in planetbeagle sometime back. After that, a fearless Nirbheek Chauhan continued hacking on it and managed to change http://bp0.blogger.com/_Dl_EHp-s13Q/RuSVUVFtUNI/AAM/SShdc8dzfTs/s1600-h/beagle-web-interface.jpg.png to http://cs-people.bu.edu/dbera/blogdata/beagle_webui-1.png I believe its now at a stage that some feedback would be good. 99.9% of it is javascript+xslt, so I(we) am mostly looking forward to UI issues. Its getting more makeup and functionality as I write this email. For the known things_to_do, check http://svn.gnome.org/viewcvs/beagle/trunk/beagle/beagled/webinterface/TODO?view=markup To get it running, you need the svn trunk. Get it, build it. Then (*) cd to beagle/beagled and start beagled as ./beagled Yes, beagled needs to be started from beagle/beagled directory. Point your browser (oops... Firefox browser) to http://127.0.0.1:4000/ (**) and enjoy your stay. Please, some feedback is necessary. At least say you hate e.g. the stupid way the last modified is shown. - dBera (*) The need to start beagled from beagled directory is a temporary one. (**) For security minded people out there, the server actually opens port 4000 to all! So any machine would be able to search (this is again temporary and would be configurable in near future). If you dont like to have the open port during testing, replace in beagled/Server.cs: http://*:{0}/ by http://127.0.0.1:{0}/ -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: Webinterface for beagle search (and more...)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dammit, forgot to attach the patch, *rolls eyes* - -- ~Nirbheek Chauhan, he who has realised that today is not his day. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.7 (GNU/Linux) Comment: http://firegpg.tuxfamily.org iD8DBQFHAnjOb1z91vbKYbYRAleZAJ9goi45ZBfUCZb/216E02a2jYCJ3QCcCGGG oWPNT9AZqHrLzNz1Svy1Smw= =ian5 -END PGP SIGNATURE- enable_webbeagle_beagle.patch Description: Binary data ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: Webinterface for beagle search (and more...)
The backend must be enabled by making Ahh... right. You need to have the network service enabled (turned off by default): $ beagle-config networking ServiceEnabled (it toggles the flag, and prints the current behaviour after toggling) Then, when you call beagled from inside beagle/beagled, call it with `beagled --backend +NetworkServices` Yes, this too. (By default, the NetworkServices backend is turned on, so you wont need it normally). - dBera -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: GSoC Weekly Report
On Tuesday 02 October 2007 19:13, you wrote: Thinking quickly, one way to do this would be to add an option to query to specify the language. That's a nice option, but the default should be to search all languages I think. People are used to just type a word without setting another option. Regards Daniel -- http://www.danielnaber.de ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: ITagProvider
On 10/2/07, Joe Shaw [EMAIL PROTECTED] wrote: Hi, Sorry for the delay. On 9/28/07, Kevin Kubasik [EMAIL PROTECTED] wrote: Downside is there have been attempts at a universal tagging library, what i would really love (in an ideal world) is if we crafted events through the indexservice, meaning that while we could offer a way to work against the tags from the BeagleClient API, you wouldn't have to implement it. However, for tag querying/listing, we would probably require a native API. Is the idea here to write the integration in Nautilus which would call the Beagle APIs to add and set tags? If so, I don't think this is a good solution for a couple of reasons: (1) It's Beagle specific, and with Tracker out there and a fairly divided community, a single implementation will never get all the buy-in it needs at this point. Agreed, tracker could technically be the tagging backend/provider. (2) Tagging really has nothing to do with desktop search and indexing. Tags should be indexed by the indexer and made available through search, but fundamentally they're no more related than metadata in MP3s, JPEGs, emails, etc. I agree that they _shouldn't_ be treated differently, however, the inherent complexity (3) The amount of code you'd actually have to write to do this as a totally separate library in C isn't that much more work. You'll be able to integrate with D-Bus, have the potential to get community buy-in for the library, and we can still use it in Beagle. The specifics of how much of this were going to make beagle responsible for (is beagle almost like a 'tagging adapter'? do we want to provide complete bi-directional support? or do we encourage people to work through the provider that we are using as a backend?) These problems all go away by implementing it as a separate, standalone library. Beagle's UI simply uses the library widgets, talks to the library APIs, and gets notification (and reindexes updated tags) via D-Bus. The only extra benefit something like Beagle gives you is the ability to push file system events back into the tag library, and Beagle needn't do that -- any file system monitoring system could provide that. Agreed. I would just prefer that instead of Beagle trying to provide a robust tagging api, we just integrate it into our search intelligently, and treat it as a property. I agree wholeheartedly with this, and it's the reason why I wrote the Nautilus metadata backend in the trunk. A tagging library backend could fit pretty cleanly into this mold. There were 2 issues I found with this model, they could be just a matter of implementation. 1) We are still bound to a single backend system, intelligently handling universal desktop tagging would be quite difficult. 2) Data replication, as well as sync/performance issues with users who actually utilize tags (think thousands of tagged files). Now time and energy could optimize these scenarios (I think). I'm working on writing some other metadata backends so I get a better feel for the system. Well, the problem here is, I really don't want us to be tied to one index, or one backend. I would think (and I could be wrong here) that once we have merged the results from a backend, we could add any Uri's that were tagged with one of the query words. The key point here is to have a universal tagging store that transcends our backend system. Take a look at the Nautilus metadata backend. This is basically what it does. It doesn't have an indexing backing it, it simply sets additional properties on documents in existing backends. Or, if you're feeling particularly adventurous, rewrite the FSQ. That's the biggest consumer of memory at this point and has problems like being unable to search by parent directories. I've described the issues with it in more detail in a previous email, I believe. :) I seem to remember something about that. Its probably overkill right now, with the slew of new features, but I am curious about it from a design standpoint. Maybe, but I consider it the #1 problem with Beagle right now. I couldn't find my previous email about it, so I'll type it up soon. Joe -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: beagle r4011 - in trunk/beagle: beagled beagled/IndexingServiceQueryable search/Tiles
Hi, On 10/2/07, Kevin Kubasik [EMAIL PROTECTED] wrote: On 10/2/07, D Bera [EMAIL PROTECTED] wrote: - I am not sure starting a _new_ beagle-search is a good idea. It should search in that same one. I dont use that UI much (actually, never) so you might want to get some feedback about this. I'm inclined to use this manor, as I don't like beagle-search losing my old search, although its pretty trivial to use one or the other, whatever the consensus on the list is will be my course of action. It's worth investigating how hard it would be to have beagle-search open multiple search windows per instance. Right now there's no other way to do it, but firing up another instance of beagle-search is gross. Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: beagle r4011 - in trunk/beagle: beagled beagled/IndexingServiceQueryable search/Tiles
Hmmm, I agree that completely new instances of beagle-search aren't ideal, but they are quite light. I think that having multiple instances of beagle-search really isn't bad, no doubt a unified system would be better, but I really don't know where to start. If people are really against the current system (launching new beagle-search instances) I can disable it until we have a better solution, I just found the 'find mail from' workflow so common for me that it was almost imperative that it be available. I await the general opinion. But I tend to find that even with big queries, its hard to get beagle-search over a meg of real memory. On 10/2/07, Joe Shaw [EMAIL PROTECTED] wrote: Hi, On 10/2/07, Kevin Kubasik [EMAIL PROTECTED] wrote: On 10/2/07, D Bera [EMAIL PROTECTED] wrote: - I am not sure starting a _new_ beagle-search is a good idea. It should search in that same one. I dont use that UI much (actually, never) so you might want to get some feedback about this. I'm inclined to use this manor, as I don't like beagle-search losing my old search, although its pretty trivial to use one or the other, whatever the consensus on the list is will be my course of action. It's worth investigating how hard it would be to have beagle-search open multiple search windows per instance. Right now there's no other way to do it, but firing up another instance of beagle-search is gross. Joe -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: ITagProvider
On 10/2/07, Joe Shaw [EMAIL PROTECTED] wrote: Hi, On 10/2/07, Kevin Kubasik [EMAIL PROTECTED] wrote: Agreed, tracker could technically be the tagging backend/provider. Sure. I have pondered writing a Tracker backend for Beagle in the past. (2) Tagging really has nothing to do with desktop search and indexing. Tags should be indexed by the indexer and made available through search, but fundamentally they're no more related than metadata in MP3s, JPEGs, emails, etc. I agree that they _shouldn't_ be treated differently, however, the inherent complexity What's the inherent complexity? Sorry, that sentence totally didn't finish, I meant to say the inherent complexity of querying across multiple backends and merging the results. I agree wholeheartedly with this, and it's the reason why I wrote the Nautilus metadata backend in the trunk. A tagging library backend could fit pretty cleanly into this mold. There were 2 issues I found with this model, they could be just a matter of implementation. 1) We are still bound to a single backend system, intelligently handling universal desktop tagging would be quite difficult. I'm not sure what the context of backend here is. I think a desktop library would handle more than just files -- it'd be URI based like Beagle -- so it could handle emails, web pages, etc. Agreed, thats not the issue, its on our side, intelligently merging mutiple results from/for the same Uri. If you mean things like pulling from del.icio.us, then you'd just create a separate Beagle backend. One for the local library and one for del.icio.us. With all the focus GNOME is giving to the Online Desktop metaphor, there's no reason why the local database and a remote database like del.icio.us couldn't be sync'd independent of Beagle. Yeah.. I guess the whole online paradigm works really well at making this 'ok'. 2) Data replication, as well as sync/performance issues with users who actually utilize tags (think thousands of tagged files). What's the concern specifically here? The database will have to be timestamped somehow to make offline change notification reasonably performant. Its that most tagging databases are databases, without said timestamp, we end up throwing thousands of changes. In a perfect world every change to a tagging database would have a timestamp, but I think that in most cases we won't be nearly that lucky. Take the frontrunner (leaftag) its a sqlite database without timestamps, short of copying the db and comparing it every time its modified, I don't see a sane way to notice and update just our changes. In this type of case, it seems like we would be better off just querying leaftag directly, and then processing its results internally. or I could be a fool, and this has all been solved already, I'm not really sure. -Kevin Joe -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: ITagProvider
On Tue, 2007-10-02 at 17:30 -0400, Kevin Kubasik wrote: Take the frontrunner (leaftag) its a sqlite database without timestamps, short of copying the db and comparing it every time its modified, I don't see a sane way to notice and update just our changes. In this type of case, it seems like we would be better off just querying leaftag directly, and then processing its results internally. Leaftag needs a bit more care and attention before it is really usable in my opinion. So I think we can probably fix that issue easily enough by extending the DB schema. Any tagging system that is used should be multi-user as well. Cheers! -- Andrew Ruthven, Wellington, New Zealand At home: [EMAIL PROTECTED] | This space intentionally |left blank. signature.asc Description: This is a digitally signed message part ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers