Andreas Hartmann wrote:
Michael Wechner wrote:
Andreas Hartmann wrote:
Hi Lenya devs,
I'd like to raise an issue that bothers me for quite a long time
and share some random thoughts.
Currently, Lenya is based on the following axioms
(please correct my if I'm wrong):
1. A URL is represented by exactly one document.
what you do mean by one document? In the case of the default
publication this might be true,
but one can do it very differently
No, that's not yet supported by the Lenya framework. Of course you can
implement a custom solution using plain Cocoon.
that's what I mean. Lenya may offer something, but nobody is forced to
use it and
I think this is important for people to know
Joachim Wolfang suggested to support multiple content items per page
(http://wiki.apache.org/lenya/ProposalContentModel), but AFAIK it is
still in proposal state.
IIRC it's a generic aggregation framework
2. A document can be represented by an arbitrary number of URLs.
you mean like "softlinks"?
Actually not quite. Softlinks would be fine, because that would imply
a "real", native URL. But Lenya supports multiple "native" URLs for
a Document object. In the filesystem, that would mean that one and
the same file occurs in multiple paths, without the ability of singling
out one of them.
you mean duplicated content?
3. For each document, there is exactly one canonical URL.
what do you mean by canonical URL?
We once created the term to be able to denote a kind of "primary" URL.
It is not clearly defined what this URL should look like. You can
probably compare it to a canonical filesystem path.
If you generate the canonical URLs of two documents d1 and d2, and the
canonical URLs are equal, then d1 and d2 represent the same document.
The DefaultDocumentBuilder returns /foo.html for the default language
and /foo_xx.html for the other languages as canonical URLs.
let's assume the default language is english and you ask for foo/en,
then you will
receive foo.html instead of foo_en.html?
I would expext it the other way around ...
This is reflected in the following methods:
DocumentBuilder.buildDocument(...)
DocumentBuilder.buildCanonicalUrl(...)
if you use the DocumentBuilder, then I guess the above is correct,
but I don't think one has to use the DocumentBuilder
It's virtually impossible not to use Lenya without the DocumentBuilder.
hm ... well, I do at least with 1.2.x
You couln't use any of the valuable features like workflow, the usecase
framework, transactions etc.
...
At the moment, the concept of multiple URLs per document is typically
used for language versions (foo_{defaultlanguage}.html = foo.html)
and to support different URL suffixes (foo, foo.htm, foo.html).
The site structure is currently tightly connected to the URL space.
Link URLs are derived directly from the site structure:
<node id="foo">
<node id="bar"/>
</node>
is interpreted as
/foo/bar
The language version is handled orthogonally to the site structure.
The URL is determined by combining both document ID and language.
If we want to allow multiple site structures, we have to choose between
the following options:
1. The connection between site structures and URL space is kept.
This implies
- a document has a different canonical URL for each site structure
- calculating a document's URL depends on the site structure
2. The purpose of the site structure is reduced to building navigation
widgets etc., the URL space is orthogonal to that.
- a document has only one single canonical URL
- the site structure stores the UUID of a document
- navigating the site structure is not reflected 1:1 in the URL
space
I am not sure if I understand you correctly, but I would say we
should go with (2), but
I guess if you make an example, e.g.
/en/developers/andreas-hartmann
/de/entwickler/andreas-hartmann
/en/committers/andi
/de/committers/andreas
In my opinion, only one of these URLs should actually represent the
document.
The others should merely point to the document, i.e. by redirects, URL
rewriting or another concept like this. If you ask the document for its
URL, there should be only one option that can be returned.
well, I don't think so, resp. to me the above is just a naviagtion thing
and within
the repository you have a different path resp. a UUID
Option (2) implies that, when a document is created, its URL and its
location
in the site structure have to be determined. IMO this is just a GUI
issue.
In most cases, a default site structure which corresponds to the URL
space,
will be used to create documents. These documents can be referenced
from
other site structures later on.
I'm not particularly fond of the DocumentBuilder concept. With
option (2)
and the default site structure it would be obsolete, because the
document
could be derived directly from the default site structure. The
ambiguity
that multipe, arbitrary URLs can point to a document would be removed.
----
The question is if multiple URLs for a document should be allowed at
all.
sure, why not? I think there are many usecases for that and existing
URL spaces
which couldn't be handled by Lenya if it won't support this...
Sure, the system should allow to have multiple URLs pointing to a
document.
But, as I already mentioned, there are several concepts to support this:
- redirects
- URL rewriting (proxy)
- soft links
- ... (?)
We don't have to support multiple URLs to natively *represent* a
document.
well, it seems to me you can do this very easily by separating the
navigation framework properly
from the "repository space", whereas one can offer the repository
navigation as default or "canonical" navigation
Actually I don't think this is necessary. At the moment, many
publications
show the following behaviour:
/foo.html -> Hello World!
/foo_en.html -> Hello World!
/foo_de.html -> Hallo Welt!
Why is the support for /foo_en.html necessary? I see only two reasons:
1. Laziness. You don't have to find out the default language to
create a URL.
2. You can switch the default language without creating dead URLs.
IMO both of them don't outweigh the disadvantages of an ambiguous
URL space.
In fact, (2) should probably be avoided because the content of a
document
page changes (it becomes a different language version). So IMO it
could look
like this:
/foo.html -> Hello World!
/foo_en.html -> 404
/foo_de.html -> Hallo Welt!
what if you switch the default language to german,
... which is not a good thing to do IMO, see above ...
why not? It seems to me very simple, because it's just matter of not
favorizing any (pre)selected language ...
then suddenly all foo_de become 404?!
You could solve this using redirects, as Solprovider suggested.
Or using softlinks.
I don't think that's necessary, because these documents/URLs do exist,
so why not let people retrieve them?!
Actually this would simplify the URL mapping concept by merging
document ID
(or better document path to avoid confusion with the UUID) and
language.
In the site structure, there wouldn't be multiple language versions
of a document, but only links to documents. The connection between
the actual
language versions of a document would be represented in another
location
(see ContentNode and Document in o.a.l.cms.repo for more information).
Assuming we have two documents which are language versions of the
same content:
* language="en" uuid="1-en"
* language="de" uuid="1-de"
This could be represented for instance by the following default site
structures:
1. /foo.html
/foo_de.html
<node id="foo" document-uuid="1-en"/>
<node id="foo_de" document-uuid="1-de"/>
(note that the language suffix "_de" is just a part of the URL)
I am not sure if this is a good idea and what the consequences are
... my belly tells me that it's a bad idea ;-)
(e.g. in the case of switching the default language)
OK, how about this:
<node id="foo" softlink="1-en"/>
<node id="foo_en" document-uuid="1-en"/>
<node id="foo_de" document-uuid="1-de"/>
If you change the default language, you'd have to change the links
(automatically), but IMO this price can be paid.
I rather think that the default language should be a redirect to the
actual language, e.g.
/foo.html is being redirected to /foo_de.html
or
/foo.html is being redirected to /de/foo.html
2. /en/foo.html
/de/foo.html
<node id="en">
<node id="foo" document-uuid="1-en"/>
</node>
<node id="de">
<node id="foo" document-uuid="1-de"/>
</node>
Assuming that a document can only be referenced once in the default
site
structure, it is now trivial to map URLs to documents and vice
versa, without
using a DocumentBuilder. The important fact is that the knowledge
how to map
URLs to documents belongs to the component which *creates*
documents. That's
why there's no knowledge duplication if you hard-code that the German
version of /en/foo should be created at /de/foo.
----
Supporting the other case, multiple URL suffixes for a document, is
certainly
necessary. But I'd separate this information from the document itself.
IMO the URL suffix should be used to request a certain view of a
document:
/foo -> HTML view
/foo.html -> HTML view
/foo.pdf -> PDF view
/foo.print.html -> print HTML view (if CSS is not appropriate or
whatever)
this might be one scheme, but others are possible as well. I think
Lenya needs to allow flexibility here,
because otherwise you shut Lenya out from many URL spaces being used
Yes, it was just an example.
ok
The canonical URL of a document whould be assembled from the canonical
base URL (/foo) and the extension denoting the view. This would be done
by the client code, the document itself (or whatever component knows
the
document's URL) would just return the canonical base URL. (BTW, the
term
canonical is not necessary anymore since only one base URL exists
per document)
----
Another question: With multiple site structures, how does the system
keep
track of the currently selected site structure?
- URL prefix
that would be my first suggestion, similar to "context" for servlets
Yes, but it would require reserved URL spaces.
[...]
well, what's the alternative ;-)
I think it's best if we use a few real world examples, because then
it becomes much clearer very quickly.
Yes, my statements were of rather general nature. Is there anything
particular you'd like an example for?
yes, but I think it's best if we just start usecases in the Wiki and
start defining a common language,
otherwise I am afraid that we might disagree on stuff we actually agree
on and vice versa ;-)
Thanks for your comments,
thank you :-)
Michi
-- Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Michael Wechner
Wyona - Open Source Content Management - Apache Lenya
http://www.wyona.com http://lenya.apache.org
[EMAIL PROTECTED] [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]