[report] hackaton

Stefano Mazzocchi Thu, 30 May 2002 08:03:24 -0700

Many people expressed the intention of partecipating at the small and
improvised hackaton held on Pavia this monday. Gianugo, Ken, Ugo and I
were present.


The discussion was on 'what to put underneath cocoon'. So, in short,
what to use as a content repository. This also, in the light of my
recent involvement with the JSR 170 which is probably going to release a
Referenc Implementation of a Java Repository API using Jakarta Slide as
the codebase to start from (two other Slide committers are part of that
group).

Unfortunately, I'm bound to a NDA and I can't write much about the
details of the API, but I can write about the general vision of the API.

So, long-term goal is pretty easy: what to put underneath Cocoon is a
repository implemented on top of the JSR 170 Repository API.

The JSR WG considers transparent interoperability between repositories
as a must have. So you should be able to change your repository in the
future, without many issues, much like you do today to deploy your WAR
files around your app-servers.

What is a 'content repository'
------------------------------

In short a content repository is a slightly-document-oriented
semi-structured hierarchical database. 

It provides features such as:
 
 - writing documents/collections (small, big, in big quantities)
 - reading documents/collections (with constant or nearly constant
access time)
 - moving/copying/erasing documents/collections

 - controlling access to the documents/collections

 - writing/reading versions of the various documents (creating
versioning depth, a-la CVS)

 - transactions (ability to rollback to a previous state)
 - event monitoring (triggers and the like)

and last but not least

 - semi-structured querying

Apache Slide
------------

The JSR 170 isn't going to release anything public/usable until the end
of the year and they are probably optimists.

Apache Slide markets itself as a 'content management system' but I
strongly disagree that you can have a CMS with Slide alone. Slide is,
IMO, a content repository. It totally lacks the 'management system',
which, IMO, should be provided by something on top that uses the Slide
API (or the JSR 170 API in the future).

Slide comes with a WebDAV DeltaV interface to the repository (which is
currently used by the XML database Tamino from Software AG, which is
also part of the JSR 170 and very interested in having Tamino support
that API somehow).

The problem I see is that Slide totally lacks support for
semi-structured querying. For example, if you store all your XML content
in a Slide repository, the performance of running an XPath query on top
will not be faster than doing the same from the file system.

In short: it's useful to write and maintain content, it's useless as a
fast data storage for a publishing layer.

Apache Xindice
--------------

On the other hand, we have Xindice, which is pretty efficient on
semi-structured querying, but it mostly sucks as a content repository,
having being designed as a data-oriented XML database (something that I
consider totally useless).

Bringing the two together
-------------------------

The idea is to glue them together: Slide on top, Xindice down below.
Forget the XML:DB API, there is no reason to be slowed down by it: I see
no market (and no need!) for data-oriented XML databases. The market is
for content repositories where XML-based engines can provide efficient
and granular querying.

So, the architectural picture I thought is something like this:

 -(XIndice API)- Slide -(Slide API)-
                   |
                Xindice

where the Slide API gives access to the repository with a 'file system'
view, while the XIndice API gives access to the repository as a 'big
persistent DOM' view. Depending on your needs, you can do whatever you
want.

NOTE: JSR 170 will glue both APIs into one.

Things to do
------------

First, we must implement an Xindice-based Slide Store. I picture using
namespaced XML to provide things like versioning and the like.

Then, we must provide a query layer that 'cleans' the XPath queries. I
don't know if the DASL interfaces that Slide has are too WebDAV oriented
or can be used for this, I don't know. Comments are welcome.

Finally, we should write a 'Slide' Source that is able to read resources
from a slide repository, providing versioning and xpointer-like quering
capabilities.

That's all folks.

Fire at will.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<[EMAIL PROTECTED]>                             Friedrich Nietzsche
--------------------------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

[report] hackaton

Reply via email to