Hi Lenya devs,

I volunteered to take the lead in the implementation of the "tag cloud" feature (see [1]).

Some initial ideas:

IMO it makes sense to use the Dublin Core element "subject" to assign tags to a document [2].

Definition: "The topic of the resource."
Comment: "Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary. To describe the spatial or temporal topic of the resource, use the Coverage element."

I guess this can be made configurable, we could just use the DC subject as the default. Since tags can contain spaces, we should use multiple meta data values to store multiple tags. A nice GUI for this has to be implemented. Would it be sufficient to extend the standard meta data GUI to allow entering multiple values, or do we need a dedicated tag management GUI? I'd suggest to start with the existing meta data GUI.


Finding all documents with a certain tag is rather simple since all meta data are indexed. The real challenge is to generate a list of all existing tags.

Maybe there is a performant way to generate the cloud using the index, e.g. via a wildcard query. But this still needs some postprocessing, so we'll probably have to cache the tag cloud.

If Lucene doesn't help, we have another nifty feature for this purpose: the RepositoryListener interface. By registering a listener with the repository, we can extract the tags of a document when it is saved, and update the tag cloud accordingly. The cloud also has to be updated when a document is removed. The details are a bit tricky (concurrency, queuing), but I think there's nothing that can't be solved. In this case we have to store the tag cloud. My first idea would be to use a dedicated document for this purpose.

I'd prefer the dynamic generation using Lucene, though, because otherwise we store redundant information in the repository which always carries a certain risk.


Another issue is supporting the user when she enters the tags. The system should present a list of existing tags, possibly with some kind of autocomplete functionality. But I guess when we manage to generate the cloud, this feature can easily be added.


Any comments and ideas are very welcome!


[1] http://wiki.apache.org/lenya/ModulesIdeas
[2] http://dublincore.org/documents/dces/

-- Andreas


--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to