Re: ITagProvider
On 10/2/07, Joe Shaw [EMAIL PROTECTED] wrote: Hi, Sorry for the delay. On 9/28/07, Kevin Kubasik [EMAIL PROTECTED] wrote: Downside is there have been attempts at a universal tagging library, what i would really love (in an ideal world) is if we crafted events through the indexservice, meaning that while we could offer a way to work against the tags from the BeagleClient API, you wouldn't have to implement it. However, for tag querying/listing, we would probably require a native API. Is the idea here to write the integration in Nautilus which would call the Beagle APIs to add and set tags? If so, I don't think this is a good solution for a couple of reasons: (1) It's Beagle specific, and with Tracker out there and a fairly divided community, a single implementation will never get all the buy-in it needs at this point. Agreed, tracker could technically be the tagging backend/provider. (2) Tagging really has nothing to do with desktop search and indexing. Tags should be indexed by the indexer and made available through search, but fundamentally they're no more related than metadata in MP3s, JPEGs, emails, etc. I agree that they _shouldn't_ be treated differently, however, the inherent complexity (3) The amount of code you'd actually have to write to do this as a totally separate library in C isn't that much more work. You'll be able to integrate with D-Bus, have the potential to get community buy-in for the library, and we can still use it in Beagle. The specifics of how much of this were going to make beagle responsible for (is beagle almost like a 'tagging adapter'? do we want to provide complete bi-directional support? or do we encourage people to work through the provider that we are using as a backend?) These problems all go away by implementing it as a separate, standalone library. Beagle's UI simply uses the library widgets, talks to the library APIs, and gets notification (and reindexes updated tags) via D-Bus. The only extra benefit something like Beagle gives you is the ability to push file system events back into the tag library, and Beagle needn't do that -- any file system monitoring system could provide that. Agreed. I would just prefer that instead of Beagle trying to provide a robust tagging api, we just integrate it into our search intelligently, and treat it as a property. I agree wholeheartedly with this, and it's the reason why I wrote the Nautilus metadata backend in the trunk. A tagging library backend could fit pretty cleanly into this mold. There were 2 issues I found with this model, they could be just a matter of implementation. 1) We are still bound to a single backend system, intelligently handling universal desktop tagging would be quite difficult. 2) Data replication, as well as sync/performance issues with users who actually utilize tags (think thousands of tagged files). Now time and energy could optimize these scenarios (I think). I'm working on writing some other metadata backends so I get a better feel for the system. Well, the problem here is, I really don't want us to be tied to one index, or one backend. I would think (and I could be wrong here) that once we have merged the results from a backend, we could add any Uri's that were tagged with one of the query words. The key point here is to have a universal tagging store that transcends our backend system. Take a look at the Nautilus metadata backend. This is basically what it does. It doesn't have an indexing backing it, it simply sets additional properties on documents in existing backends. Or, if you're feeling particularly adventurous, rewrite the FSQ. That's the biggest consumer of memory at this point and has problems like being unable to search by parent directories. I've described the issues with it in more detail in a previous email, I believe. :) I seem to remember something about that. Its probably overkill right now, with the slew of new features, but I am curious about it from a design standpoint. Maybe, but I consider it the #1 problem with Beagle right now. I couldn't find my previous email about it, so I'll type it up soon. Joe -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: ITagProvider
On 10/2/07, Joe Shaw [EMAIL PROTECTED] wrote: Hi, On 10/2/07, Kevin Kubasik [EMAIL PROTECTED] wrote: Agreed, tracker could technically be the tagging backend/provider. Sure. I have pondered writing a Tracker backend for Beagle in the past. (2) Tagging really has nothing to do with desktop search and indexing. Tags should be indexed by the indexer and made available through search, but fundamentally they're no more related than metadata in MP3s, JPEGs, emails, etc. I agree that they _shouldn't_ be treated differently, however, the inherent complexity What's the inherent complexity? Sorry, that sentence totally didn't finish, I meant to say the inherent complexity of querying across multiple backends and merging the results. I agree wholeheartedly with this, and it's the reason why I wrote the Nautilus metadata backend in the trunk. A tagging library backend could fit pretty cleanly into this mold. There were 2 issues I found with this model, they could be just a matter of implementation. 1) We are still bound to a single backend system, intelligently handling universal desktop tagging would be quite difficult. I'm not sure what the context of backend here is. I think a desktop library would handle more than just files -- it'd be URI based like Beagle -- so it could handle emails, web pages, etc. Agreed, thats not the issue, its on our side, intelligently merging mutiple results from/for the same Uri. If you mean things like pulling from del.icio.us, then you'd just create a separate Beagle backend. One for the local library and one for del.icio.us. With all the focus GNOME is giving to the Online Desktop metaphor, there's no reason why the local database and a remote database like del.icio.us couldn't be sync'd independent of Beagle. Yeah.. I guess the whole online paradigm works really well at making this 'ok'. 2) Data replication, as well as sync/performance issues with users who actually utilize tags (think thousands of tagged files). What's the concern specifically here? The database will have to be timestamped somehow to make offline change notification reasonably performant. Its that most tagging databases are databases, without said timestamp, we end up throwing thousands of changes. In a perfect world every change to a tagging database would have a timestamp, but I think that in most cases we won't be nearly that lucky. Take the frontrunner (leaftag) its a sqlite database without timestamps, short of copying the db and comparing it every time its modified, I don't see a sane way to notice and update just our changes. In this type of case, it seems like we would be better off just querying leaftag directly, and then processing its results internally. or I could be a fool, and this has all been solved already, I'm not really sure. -Kevin Joe -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: ITagProvider
On Tue, 2007-10-02 at 17:30 -0400, Kevin Kubasik wrote: Take the frontrunner (leaftag) its a sqlite database without timestamps, short of copying the db and comparing it every time its modified, I don't see a sane way to notice and update just our changes. In this type of case, it seems like we would be better off just querying leaftag directly, and then processing its results internally. Leaftag needs a bit more care and attention before it is really usable in my opinion. So I think we can probably fix that issue easily enough by extending the DB schema. Any tagging system that is used should be multi-user as well. Cheers! -- Andrew Ruthven, Wellington, New Zealand At home: [EMAIL PROTECTED] | This space intentionally |left blank. signature.asc Description: This is a digitally signed message part ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: ITagProvider
Will do either tonight or sometime this weekend, I'm going to look into one of the new subversion merge tracking tools, since DSCM has spoiled me rotten and I like free merges. A few questions open to the community: 1) I'm working from a very basic core idea of a 'tag' simply being a string associated with a uri, I know tags can be way more, but I want our base ITagProvider to be generic enough for simple systems. How do people feel about this? 2) What other tagging implementations (beyond leaftag) has anyone seen for the open source desktop? 3) What (if any) are the more advanced features that people might want in a tagging api, off the top of my head: * Tag Descriptions/Icons * Parent and Child Tags * 'hidden' or unsearched tags anything else? Its important to realize that my focus is not on the implementation of a backend to store this information, but to provide a generic way for us to include 'tag' information from a variety of sources. Also if anyone knows of a smart way to take a bunch of internally-mapped Uri's and merge them with the existing result sets, I'm still getting some frustration on that point, while I've figured out the functional steps that the bitarray's serve (as in where to put one when I want to search an index etc) once I've run the query to fill/populate it, I'm not really sure of what I can do with a LuceneBitArray or BetterBitArray. Anyways, I'm sure I'll eventually get it, but help would save some painful slow debugging time. Cheers! Kevin Kubasik On 9/28/07, Debajyoti Bera [EMAIL PROTECTED] wrote: Hi Kevin, Since some people are far too lazy to use patches (or are just that cool ;) ) theres a bzr branch here: https://code.launchpad.net/~kkubasik/beagle/kkubasik-beagle Thanks for getting started on this. I was going to mention using a branch for adding these changes when I saw this mail. Can we get this branch in gnome-svn ? Its a bit tedious working with different repos. If possible make a new beagle-tagging-branch in svn and make it your playground. Thanks, - dBera -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: ITagProvider
On 9/28/07, Joe Shaw [EMAIL PROTECTED] wrote: Hi, On 9/28/07, Kevin Kubasik [EMAIL PROTECTED] wrote: Its important to realize that my focus is not on the implementation of a backend to store this information, but to provide a generic way for us to include 'tag' information from a variety of sources. Often you can't separate implementation details from the list of use cases, though. Yeah, to a large extent were stuck =/ Originally I wanted to do tags solely using extended attributes on files. There are some definite advantages to this, like the ability to maintain tags when files are copied around. But this decentralized model does not lend itself to doing, for instance, tag clouds. There's no way to get a list of all possible tags in a decentralized model without walking the entire file system tree -- obviously a non-starter. Yeah, that was actually my first thought, since it just seemed like the most fitting solution, as the tags would actually be with the files. So before worrying about parent-child tag relationships too much, start off simple and refine the design as you go along. What do you anticipate the uses for tagging on the user side? What uses would a programmer want for tags, and how might this API look? How do different (broad) implementation strategies fit into this? No doubt you'll have to make compromises somewhere. Hmmm... well, the dream solution is always nautilus getting a slick tagging ui, problem is, I'm not really a slick-ui-writer guy. However, with the beagle plugin already in thunderbird, theres potential for integration there. Downside is there have been attempts at a universal tagging library, what i would really love (in an ideal world) is if we crafted events through the indexservice, meaning that while we could offer a way to work against the tags from the BeagleClient API, you wouldn't have to implement it. However, for tag querying/listing, we would probably require a native API. The specifics of how much of this were going to make beagle responsible for (is beagle almost like a 'tagging adapter'? do we want to provide complete bi-directional support? or do we encourage people to work through the provider that we are using as a backend?) I have a strong feeling that in the end were going to end up with a sqlite database that we are maintaing and working against, which isn't the end of the world, I would just prefer that instead of Beagle trying to provide a robust tagging api, we just integrate it into our search intelligently, and treat it as a property. That being said, until the querying component works successfully, I really don't want to get to excited about the other direction. Also if anyone knows of a smart way to take a bunch of internally-mapped Uri's and merge them with the existing result sets, I'm still getting some frustration on that point, while I've figured out the functional steps that the bitarray's serve (as in where to put one when I want to search an index etc) once I've run the query to fill/populate it, I'm not really sure of what I can do with a LuceneBitArray or BetterBitArray. Anyways, I'm sure I'll eventually get it, but help would save some painful slow debugging time. I'm not totally sure what you're trying to do here, but I would suggest (a) keeping the primary storage of tags totally separate from the index and (b) dealing only with real (external) URIs and pushing changes out to the tag DB as needed. That would probably require some additional events or something to be added to the FSQ. Well, the problem here is, I really don't want us to be tied to one index, or one backend. I would think (and I could be wrong here) that once we have merged the results from a backend, we could add any Uri's that were tagged with one of the query words. The key point here is to have a universal tagging store that transcends our backend system. I think I might be misunderstanding, but its not the storage of the tags, its that a tag query has 2 defining functions (in my view, im open to a different interpretation) 1) If an item has a tag, that tag should appear as a property (to the client). I think this means we need to catch Hits before they are returned and add another property (when appropriate) 2) If a query matches a tag, then all of the tags Uri's should be returned, this is where I'm getting really caught, because I can't quite figure out how to take a pretty outside Uri and get a complete Hit back. I'm not really sure where that wires into the FSQ and more than the other queryables. Or, if you're feeling particularly adventurous, rewrite the FSQ. That's the biggest consumer of memory at this point and has problems like being unable to search by parent directories. I've described the issues with it in more detail in a previous email, I believe. :) I seem to remember something about that. Its probably overkill right now, with the slew of new features, but I am curious about it from
Re: ITagProvider
Hi, On 9/28/07, Kevin Kubasik [EMAIL PROTECTED] wrote: Its important to realize that my focus is not on the implementation of a backend to store this information, but to provide a generic way for us to include 'tag' information from a variety of sources. Often you can't separate implementation details from the list of use cases, though. Originally I wanted to do tags solely using extended attributes on files. There are some definite advantages to this, like the ability to maintain tags when files are copied around. But this decentralized model does not lend itself to doing, for instance, tag clouds. There's no way to get a list of all possible tags in a decentralized model without walking the entire file system tree -- obviously a non-starter. So before worrying about parent-child tag relationships too much, start off simple and refine the design as you go along. What do you anticipate the uses for tagging on the user side? What uses would a programmer want for tags, and how might this API look? How do different (broad) implementation strategies fit into this? No doubt you'll have to make compromises somewhere. Also if anyone knows of a smart way to take a bunch of internally-mapped Uri's and merge them with the existing result sets, I'm still getting some frustration on that point, while I've figured out the functional steps that the bitarray's serve (as in where to put one when I want to search an index etc) once I've run the query to fill/populate it, I'm not really sure of what I can do with a LuceneBitArray or BetterBitArray. Anyways, I'm sure I'll eventually get it, but help would save some painful slow debugging time. I'm not totally sure what you're trying to do here, but I would suggest (a) keeping the primary storage of tags totally separate from the index and (b) dealing only with real (external) URIs and pushing changes out to the tag DB as needed. That would probably require some additional events or something to be added to the FSQ. Or, if you're feeling particularly adventurous, rewrite the FSQ. That's the biggest consumer of memory at this point and has problems like being unable to search by parent directories. I've described the issues with it in more detail in a previous email, I believe. :) Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: ITagProvider
Hey, did a little cleanup so no ones stuck reading impossibly bad code ;) This also has the super-sucky way of integrating with the querying of the lucene indexies. The biggest problem is that right now it will only work on internally mapped Uri's (the uid:xxx ones) . So, in addition to the real merging of queries, uri mapping/lookup should be done too. Since some people are far too lazy to use patches (or are just that cool ;) ) theres a bzr branch here: https://code.launchpad.net/~kkubasik/beagle/kkubasik-beagle On 9/28/07, Kevin Kubasik [EMAIL PROTECTED] wrote: Hey, I was chatting with DBera last night at we got off on a random little tangent, anyways, I remembered that I still hadn't shared any of the code or my thoughts that had started to evolve as far as supporting the idea of 'desktop tagging'. I figured I would attach a copy of the patch that allows you to see the current ITagProvider (unfortunety this is the majorly dumbed down interface as I tried to get it integrated, once we have this worked into the query system, I'll flesh out the API, and make my simple sample threadsafe etc.)sketchup, I need to abstract or make an interface for the Tag class, but I got far too tired last night after my battle with Lucene. DBera mentioned that the best place to implement this was probably inside LuceneQueryDriver, since we are already merging 2 result sets (the primary and secondary indexies) adding a third datasource shouldn't be too hard, should it? Either way, I tried a couple of things, and I've got a fair idea of how the process works, I'm just still getting hung up on the different BitArrays. It seems that as they are the ones holding all the results sets, to merge results from the tagging backend at the lower level, I need to figure those out. The other option is always to just build hits from the tagged Uri's and drop any duplicates, but I'm not sure thats how the response works. Anyways, I'd love some feedback/help. This is just the core/super simple implementation, once I figure out the results merging I'll add back in the child tags, descriptions, etc. -- Cheers, Kevin Kubasik http://kubasik.net/blog -- Cheers, Kevin Kubasik http://kubasik.net/blog === added file 'Util/TagProvider.cs' --- Util/TagProvider.cs 1970-01-01 00:00:00 + +++ Util/TagProvider.cs 2007-09-28 09:06:42 + @@ -0,0 +1,183 @@ +// TagProvider.cs - An interface used to pull tags from a variety of +// sources. +// +// Copyright (C) 2007 Kevin Kubasik [EMAIL PROTECTED] +// +// Permission is hereby granted, free of charge, to any person obtaining +// a copy of this software and associated documentation files (the +// Software), to deal in the Software without restriction, including +// without limitation the rights to use, copy, modify, merge, publish, +// distribute, sublicense, and/or sell copies of the Software, and to +// permit persons to whom the Software is furnished to do so, subject to +// the following conditions: +// +// The above copyright notice and this permission notice shall be +// included in all copies or substantial portions of the Software. +// +// THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, +// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +// MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE +// LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION +// OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION +// WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +// + +using System; +using System.IO; +using System.Collections; +using System.Collections.Generic; + +using Mono.Data.SqliteClient; +//using ICSharpCode.SharpZipLib.GZip; + + +namespace Beagle.Util +{ + + public interface ITagProvider{ + ITag MakeNewTag(string s); + ITag GetTag(string s); + ITag[] SearchTags(string s); + ITag[] GetTagsForUri(string s); + } + public interface ITag{ + String GetFirstUri(); + String[] GetAllUri(); + void AddUri(string s); + void DeleteUri(string s); + + } + public class BeagleTag: ITag { + string name; + SqliteConnection connection = null; + public BeagleTag (){ + name = ; + } + public BeagleTag(SqliteConnection conn){ + connection = conn; + name= ; + } + public BeagleTag(SqliteConnection conn, string argname){ + connection = conn; + name = argname; + } + public String GetFirstUri(){ + SqliteCommand scomm = new SqliteCommand(String.Format(select * from tags where tag='{0}' order by uri limit 1;,name),connection); + + SqliteDataReader sdr = scomm.ExecuteReader(); + if(sdr.Read()) +return sdr.GetString(0); + return null; + } + public String[] GetAllUri(){ + Liststring l = new Liststring(); + SqliteCommand scomm = new SqliteCommand(String.Format(select * from tags where tag='{0}' order by uri;,name),connection); + SqliteDataReader sdr =
Re: New code and branches (was Re: ITagProvider)
Yeah, I wasn't planning on checking this into trunk for some time, I was just gonna maintain in a bzr branch, but a svn one is just as cool. I haven't looked at the network searches code, but is there a TODO for that ? or is it just bugfixing/testing? On 9/28/07, Joe Shaw [EMAIL PROTECTED] wrote: Hi, On 9/28/07, Debajyoti Bera [EMAIL PROTECTED] wrote: I was going to mention using a branch for adding these changes when I saw this mail. Can we get this branch in gnome-svn ? Its a bit tedious working with different repos. If possible make a new beagle-tagging-branch in svn and make it your playground. Indeed. All new code really needs to go onto a branch and made working before being merged back onto the trunk. I've not been particularly happy with two fairly large chunks of code that are half-finished sitting on the trunk (network searches and the web interface). The former in particular is the largest thing holding up a 0.3.0 release, IMO. The latter is more easily disabled, but I'd still prefer not to ship it (in source) if it's not considered ready. Joe ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers -- Cheers, Kevin Kubasik http://kubasik.net/blog ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers
Re: New code and branches (was Re: ITagProvider)
Hi Joe, Indeed. All new code really needs to go onto a branch and made working before being merged back onto the trunk. Note the new code ... made working before being merged ... and read on further. I've not been particularly happy with two fairly large chunks of code that are half-finished sitting on the trunk (network searches and the web interface). The former in particular is the largest thing holding up a 0.3.0 release, IMO. The latter is more easily disabled, but I'd still prefer not to ship it (in source) if it's not considered ready. This is how I look at branches. When features are at a nascent stage i.e. things dont quite work, some _basic_ features are not quite complete, it is essentially unusable and the ongoing work could take at least a few weeks, I consider it necessary to have a branch. For such small dev groups. As long as the trunk builds and runs ok after each commit, I think it OK. Accordingly, both the network-search and web-ui (which depends on the former) met the above criteria and so they are available in the trunk. They both work, has the basic features working and its important it to be there because then people who install the trunk can report bugs. Its quite hard to get any QA done on branches. As for the 0.3.0, since you released 0.2.18 a few weeks ago, I was under the feeling that it will be at least a month before 0.3.0. Nirbheek (bheekling) is working hard on the web-ui and I think he can get it look and function much better in the next few weeks (not that it is unsable now). But if you have intention of making a release, you gotta let us know. BTW, the number of new features (without the uninstalled web-ui) in trunk is overwhelming: - thunderbird backend - firefox (extension) backend - network searches - different way of indexing attachments and archives - LaTeX filter - beagle-search IO polish - snowball analyzer - XMP metadata (this is the one which I actually consider as the blocker; I have to think about the whole external metadata stuff considering how to handle nautilus/xmp/tags-store but I keep finding excuse to not to do it :( ) - features I am pretty sure I missed And to give a sneak peek on the features coming: - better web interface, including showing snippets, showing emails directly in the browser and showing full cached text - extensible local/global configuration + client API to deal with configuration. + web-ui interface to change settings - better textcache storage to reduce wasted space created by lots of small files - textcat support for language determination and using language specific stemmer (depending on performance) - BasKet backend - don't remember Some of the changes will be incremental change and will go in the trunk. Some of them are large changes, will require longer time to get it ready to be usable and may not the final implementation that will get into trunk, so those will go into a branch. I hope that gives you a better picture of whats happening in the trunk right now (and also in IRC) and helps you in deciding when to call it freeze!. - dBera -- - Debajyoti Bera @ http://dtecht.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user ___ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers