Hi Lenya devs,
I volunteered to take the lead in the implementation of the "tag cloud"
feature (see [1]).
Some initial ideas:
IMO it makes sense to use the Dublin Core element "subject" to assign
tags to a document [2].
Definition: "The topic of the resource."
Comment: "Typically, the subject will be represented using keywords, key
phrases, or classification codes. Recommended best practice is to use a
controlled vocabulary. To describe the spatial or temporal topic of the
resource, use the Coverage element."
I guess this can be made configurable, we could just use the DC subject
as the default. Since tags can contain spaces, we should use multiple
meta data values to store multiple tags. A nice GUI for this has to be
implemented. Would it be sufficient to extend the standard meta data GUI
to allow entering multiple values, or do we need a dedicated tag
management GUI? I'd suggest to start with the existing meta data GUI.
Finding all documents with a certain tag is rather simple since all meta
data are indexed. The real challenge is to generate a list of all
existing tags.
Maybe there is a performant way to generate the cloud using the index,
e.g. via a wildcard query. But this still needs some postprocessing, so
we'll probably have to cache the tag cloud.
If Lucene doesn't help, we have another nifty feature for this purpose:
the RepositoryListener interface. By registering a listener with the
repository, we can extract the tags of a document when it is saved, and
update the tag cloud accordingly. The cloud also has to be updated when
a document is removed. The details are a bit tricky (concurrency,
queuing), but I think there's nothing that can't be solved. In this case
we have to store the tag cloud. My first idea would be to use a
dedicated document for this purpose.
I'd prefer the dynamic generation using Lucene, though, because
otherwise we store redundant information in the repository which always
carries a certain risk.
Another issue is supporting the user when she enters the tags. The
system should present a list of existing tags, possibly with some kind
of autocomplete functionality. But I guess when we manage to generate
the cloud, this feature can easily be added.
Any comments and ideas are very welcome!
[1] http://wiki.apache.org/lenya/ModulesIdeas
[2] http://dublincore.org/documents/dces/
-- Andreas
--
Andreas Hartmann, CTO
BeCompany GmbH
http://www.becompany.ch
Tel.: +41 (0) 43 818 57 01
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]