Well, after the little amichi vendor slamming, I am a little leary about
posting this, since it does indirectly support a product offered by my
company, but I think it is still useful information based on your question
about metadata.  This is a short article I wrote talking about storing
metadata/content in a relational format vs. an xml format.

Please note that I was not an english major.

--------------------
Content Management Systems know nothing about your Content
David A. Vap, Software AG

It is an irony.  Most people think that the rollout of a Content Management
System (CMS) will provide an everlasting vault of valuable content that can
be easily found and reused.  The truth of the matter is that the first
generation CMS applications do not natively understand your content - making
reuse and searching difficult.  How can that be you ask?  After all, they
are Content Management Systems.  They must know and understand the content
that is being ingested into them.

Well, let's look at how this state of affairs has evolved.  When Interwoven,
Vignette, Artesia, and others set about building their first generation
applications about five years ago, here is what the technology landscape
looked like:

-       Relational technology was pervasive and accepted as primary
persistence layer
-       SGML, the XML precursor, was only used for complex technical
publishing
-       Content formats were primarily proprietary
-       HTML was all the rage

So, logically, these vendors built their products on top of relational
technology.  Much of the content was viewed as a black box and metadata was
wrapped around the content and stored in a set of relational tables.  The
metadata stored a pointer out to the actual essence (content), which was
stored either on the file system or in a streaming server.  So, in reality,
first generation CMSs are really just middlemen.  They take content in, make
an inventory of it using user assigned metadata and then pass it on to some
distribution channel.  If you asked the middleman the details of what is
inside the box on their shelves, they can't tell you.  They only know what
is on the shipment manifest.  

A few years later, XML started gaining momentum.  XML became the accepted
vehicle for both internal and external tagging of the content.  At this
point it was too late in the development cycle to pull out any relational
underpinnings and convert to a native XML store, so the vendors did the next
best thing.  They provided frameworks for importing and exporting the
content in an XML format.  This required a fascinating Cirque Du Soleil act
to convert the incoming XML stream into a set of relational tables that
meta-modeled XML structures.  Anyone who has done this before knows, XML to
Relational Mapping can get quite complicated.  It can only be done for a
subset of XML Documents and often round-tripping of a XML document is
impossible.  For example, ask your content management vendor to handle in an
intelligent manner a SCORM eLearning or METS document in their relational
structure.

As always happens with software, subsequent generations of applications must
provide substantially greater functionality in order to interrupt the market
and leapfrog larger first generation vendors.  This is exactly what is
happening in applications that are Content Oriented, including your typical
CMS solutions.

Since then, it has become accepted that XML is the primary medium for
content.  The majority of the content creation vendors such as Adobe and
Microsoft have moved their product portfolios to natively store their output
utilizing XML standards.

It would be really nice if next generation CMS knew more about the actual
content of the content and, that is exactly what is happening.  Next
Generation CMS vendors are storing content directly in a native XML store.
The benefits of this are numerous
        
-easily change the metamodel representing the content via simple schema
changes, reducing the cost of administering the system
-       search on the values of specific content tags, allowing for better
searching and reuse of content
-       reduce development/maintenance costs associated with 
                1) transforming xml based content into relational formats 
                2) providing baseline content management services covered by
WebDAV (available in most native XML databases)
-       ability to easily integrate content with other systems utilizing XML
as the transport
-       easily transform content for different delivery devices via W3C
standards such as XSLT




-----Original Message-----
From: Mattias Konradsson [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, February 05, 2003 4:47 PM
To: [EMAIL PROTECTED]
Subject: [cms-list] managing metadata


I'm working on a cms system and time has come to do the metadata stuff.  I
don't have much experience in this aspect of the cms so I'd like to hear
your thoughts
about what essential features one should include, anyone has any good
experience to share on the subject?

I've also figured out two ways I could implement metadata, either to simply
store xmldata for each document which then can be aggregated into one big
document which you can run powerful xpath expressions against. The other
method is using a dynamical database implemented in sql where you can define
different attributes for each document. The problem with the first approach
is that the aggregated xmldocuments are potentially huge and loading it into
memory for searching would probably mean a performance hit. Method nr 2
should work pretty well , should even be able to convert the sql tables to
xml if you want to do xpath or to direct sql searches, only downside I can
see is that the metadata probably can't be as hierachial complex keeping the
xml shallow. On the other hand I'm not sure metadata needs to be that
structurally complex, any thoughts?

best regards
---
Mattias Konradsson


--
http://cms-list.org/
more signal, less noise.
--
http://cms-list.org/
more signal, less noise.

Reply via email to