Hi Mattias,

I second the comments from Donna, Hillary and Tony, but allow me add a couple thoughts of my own. This is one of my favorite topics, so please excuse the long post.

I would submit that neither SQL nor XML are really the best data model for CMS metadata. If you are limited to these two, I would go with XML and rely on the capabilities of a decent XML database that allows you to treat the XML interchangeably as either deep or shallow and provides a decent query interface, i.e. not "in-memory". If you have to choose, go "shallow". It will be more adaptable to changes in the future and, in general,

There are other options that allow simpler attribute="value" queries. The reason SQL is not the best option here is that most metadata taxonomies work with multi-valued attributes. Think LDAP, MUMPS, Platinum. It's as though XML elements could have attributes with the same name:

identifier=abc
title=A Multi-lingual, Cross-Referenced Document
subject=foo
subject=bar
language=en
language=fr

Regular SQL "where" clause processing breaks down on such data due to dot product effects in the join. This is partly why Tony does what he does. You can easily get bad results. LDAP queries are a better model. In fact, Tony's SQL schema is has a lot in common with the internals of an LDAP server (and my own query engine). Great minds think alike. <g>

About the schema: you are right in thinking one size does _not_ fit all. But, going to the other extreme is no better. I find collections, preferably hierarchical, to be a useful unit for schema definition and for other content management tasks, such as replication, permissions, etc. I.e. all documents within a collection share the same allowed and required tags and, for any tag, the same set of possible values. All of this stuff maps pretty well to WebDAV. I.e. you can use XML as an exchange format without using for storage or queries.

About managing vocabularies: you would be well served to define some standard libraries for maintaining sets of values and mappings. For example, at the document level, you probably want to use the 2 or 3 character ISO language codes. You can maintain "output" mappings for as many display languages as you need to support and for any subsets that you need. For example, if you operate in Quebec, Canada, you may support French, English and Bilingual content. Thus, the "language" tag is multi-valued with a vocabulary of "en" or "fr". However, you may want to map this to:

English Output Mapping

Code Display
----- --------
en English
fr French

French Output Mapping

Code Display
----- --------
en anglais
fr français

When loading content, you may want to use "input" mappings to map various values. A common example our dear old USA, which is known variously as "U.S.", "US", "USA", "U.S.A.", "America" (which apologies to Canada and Mexico), "The United States" and, finally, "The United States of America". Any of these can, and should, be mapped to just "us". The same logic can be applied to many vocabularies. The technical point to be made here is that "input" mappings can be many-to-one, while "output" mappings are one-to-one.

The right choices - for you - will depend on your skill set, application requirements and available tools.

take it easy,
Charles Reitzel


At 10:46 PM 2/5/2003 +0100, Mattias Konradsson wrote:
I'm working on a cms system and time has come to do the metadata stuff. I don't have much experience in this aspect of the cms so I'd like to hear your thoughts
about what essential features one should include, anyone has any good experience to share on the subject?

I've also figured out two ways I could implement metadata, either to simply store xmldata for each document which then can be aggregated into one big
document which you can run powerful xpath expressions against.

The other method is using a dynamical database implemented in sql where you can define different attributes for each document. The problem with the first approach is that the aggregated xmldocuments are potentially huge and loading it into memory for searching would probably mean a performance hit.

Method nr 2 should work pretty well , should even be able to convert the sql tables to xml if you want to do xpath or to direct sql searches, only downside I can see is that the metadata probably can't be as hierachial complex keeping the xml shallow. On the other hand I'm not sure metadata needs to be that structurally complex, any thoughts?

best regards
---
Mattias Konradsson
--
http://cms-list.org/
more signal, less noise.


Reply via email to