Re: UUID musings and area bashing

Andreas Hartmann Wed, 16 Aug 2006 00:57:35 -0700

Joern Nettingsmeier wrote:

Andreas Hartmann wrote:

Hi Joern,


thanks for your comprehensive comments!

* are UUIDs unique across publications?
  -> if yes, {pubId} is redundant. do we want to drag it along?

It would be great if we could omit it, but this would require a
performant lookup mechanism. Or we just put all content in a
single box, and add the publication ID to the meta data. This
sounds quite appealing to me.


+1 for all-in-one-box, but -1 for adding it to the per-{UUID+lang}
metadata. that's suboptimal from an index efficiency pov. the
publication should maintain a list of which UUIDs belong to it. this is
also easier to debug, since you can see everything at one glance.


Is this list really necessary? Maybe we can just assume that a publication
"contains" all documents which are referenced in its site structure.
According to your area interpretation, that would mean a global trash,
though. But with a lucene index for the publication-id meta data attribute,
searching the trash wouldn't be a big issue.

moreover, the property "belonging to <pub>" is orthogonal to revisions,


Is it? Moving a document to a different publication would mean that the
next revision has a different "belongs to <pub>" value than the former one.

while per-{UUID+lang} metadata is not.

* UUIDs are definitely orthogonal to revisions. we do not need to access
revisions other than "current" most of the time, but we should make it
possible now in order to avoid having to tack on another mechanism for
situations where revisions are involved.

+1, this sounds useful.


so the "unique index" to borrow from database theory would be tuple
{UUID+lang+revision}


Yes, that would be the primary key.

[terminology]

are we realling "addressing documents"?

currently, i find in the sitemaps the term "document-uuid".
that implies we use the term "document" to mean "the set of all stored
data snippets (including meta) that corresponds to a particular UUID".

so we are not addressing documents. we are addressing particular
instances of a document in a certain publication, area and language.

At the moment, a document is a particular translation in a particular
area in a particular publication (we didn't yet change the terminology,
at least as far as the class names are concerned).

IIRC we agreed upon the term "translation" for this.
We don't have a class for "the object that contains all translations
for a UUID in a certain area" yet. That would be a document/resource/asset
(IIRC "document" was the preferred term).


so let me propose the following:

<section status="draft" normative="yes">
the entirety of all data pertaining to what is traditionally called a
"web page" is called a *document* within lenya.


IMO that is a page, which is created by expanding a document.
Expanding means that all inclusions are resolved, e.g. the document might
be a list of <ci:include> statements, but the page shows a list of blog
messages.

documents are uniquely identified by *UUIDs*, which may therefore be
called *document UUIDs* for extra clarity.

+1

documents contain one or more *translations*. "translation" here refers
to the actual content, and includes the "original language version",
being a general category.
each translation has *metadata* associated with it.


+1, whereas the document has meta data as well which apply to all
translations.

the terms MUST, MUST NOT and SHOOTING OFFENSE are to interpreted as
described in RFC2119.
</sections>

[areas]

thinking about andreas' suggestion, it becomes ever more evident to me
that the area concept is flawed. areas should be done in altogether in
the not too far future.

I agree that it has to be reconsidered, but should we address in 1.4?


HELL NO! :-D

this is 1.5 stuff. but i should think that the 1.4 cycle will be short
anyway.

An internal link URL might look like this:

  document://{pubId}:{area}:{uuid}:{language}

what about lenya: and lenyadoc:? i must confess i have never quite
grasped the concept...

lenya:// is one layer below this, it addresses repository nodes.


does that mean that it's obsolete now? or if not, what is it currently
used for?


It's used for addressing repository nodes. For instance, each document
has a repository node for the content and one for the meta data, and the
sitetree has a repository node for its XML, and so on. You shouldn't use
lenya:// in a pipeline, it is for internal use only.

lenyadoc:// is probably fine for links. Maybe we should just use that one.

in any case, the protocol should definitely begin with "lenya...", so
that it's immediately obvious what's going on in the sitemap.

i would even go as far as suggesting that all our input modules and
pseudo-protocols that are not suited for upstream cocoon be re-named
lenya-fallback, {lenya-docinfo:...} etc.
this would greatly reduce the learning curve for our users, and make
life easier for casual committers from other apache projects, since it's
obvious if custom magic is at work, as opposed to core cocoon
functionality.

-0.5, I'd prefer to keep them short, but it's OK with me to change it.


i strongly feel that cocoon namespaces must be restructured, even at the
cost of increased verbosity. it should be easy to register both the
traditional and the prefixed name for a grace period, and move the
sitemaps over piecemeal without breaking external code too soon.


OK, what do the others think?

-- Andreas

--
Andreas Hartmann
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
[EMAIL PROTECTED]                     [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: UUID musings and area bashing

Reply via email to