On 1/30/06, Andreas Hartmann <[EMAIL PROTECTED]> wrote: > Bob Harner schrieb: > > As briefly discussed on the user list recently (subject: "Losing > > hyperlinks - what xsl removes them?"), the LinkRewritingTransformer > > seems to need some improvements so that it can rewrite all types of > > links. It currently only rewrites <a href="foo"> where foo is a > > document-relative URI. I'm sure I'm NOT the best person to do so > > (being much less familiar with 1.4 than 1.2.x), but I've been looking > > over the code and humbly offer the following initial thoughts. Your > > advise and guidance is eagerly sought... > > > > 1) <editorial>We have really overloaded the word "resource" in Lenya & > > Cocoon, haven't we? Sometimes it means "an asset or a CMS document" > > (per http://wiki.apache.org/lenya/ProposalArchitecture), or sometimes > > it specifically just an asset (per Resource.java). The word is also > > used in sitemap files to refer to a reusable part of a pipeline. > > Elsewhere it refers vaguely to a "miscellaneous relate file" (the > > lenya/resources dir). Sometimes it means the amount of memory, hard > > drive space, and CPU cycles available. And Document Types are now > > officially Resource Types. > > Actually, in the repo API I called them "Document Types" again. I'm > still not sure if the term "Resource" or "Document" is appropriate for > a "content item". Or maybe "content item" is really superior. > > The terms "content type" and "document type" are preoccupied. But IMO > we should just use the same term as for content items, regardless of > any preoccupation. > > How about this hierarchy: > > - Publication > - Area > - Content > - ContentNode (belongs to a ContentType) > - ContentItem (a language version) > - (Content)Version (of the version history) > - Structures (more general than Sites) > - Structure > - StructureNode (references ContentNode or ContentItem) > > > > This overloading of terminology makes it > > harder to learn Lenya. I think "Content", "Content Item", and "Content > > Type" are probably much better terms for a CMS to use. Precise and > > unambiguous terminology always a good thing.</editorial> > > > > 2) As Andreas said a couple weeks ago, "It's about time to handle > > documents and assets in the same way". I think there is a need for a > > comon interface shared by both CMS documents and assets, so both can > > be handled uniformly -- particulary for link rewriting, where the > > URI's of both CMS documents and assets need to be rewritten in the > > same way. This would be, perhaps, "ContentItem". And both Document > > and Resource (which maybe should be named Asset?) should implement > > this interface and DefaultDocument and Resource should extend a > > DefaultContentItem class. Or is there a better idea? > > I'm not even sure if we need the separation between Documents and > Assets. Maybe there is a way to handle both of them uniformly. > I'd rather add specific functionality: > > - Can the content item input/output XML? > - How is the content item rendered when it is referenced by another > content item? > - What are the presentation options? > - ... > > IMO additional, asset-specific functionality could be handled by an > asset-management module or something like this, not by the core API. > > > > 3) I think maybe the link rewriting should be done when a CMS document > > is published, deactivated, or exported, rather than every time it is > > displayed. > > The problem is that the document has to be updated when *another* > document is changed/removed. This means when you deactivate a document, > you have to remove the links from all documents which are referencing > this document. I agree that this would be a good thing, but with the > current architecture it is a very time-consuming operation.
Let me see if I'm following you: the only reason for rewriting links at display time (rather than when the CMS document is create/modified) is so that we can remove any links that point to other CMS documents that have been deleted or moved, right? This seems like an I/O and CPU hog for pages with lots of links, and the benefit seems minimal. Personally, I might rather have a broken link than a removed link anyway :-) because at least I can use external tools to detect broken links, but not if we remove them. (I know that in 1.4 such links to missing documents are displayed specially, but not so in the live area.) The reason I'm interested in this point is that I'd like to see LinkRewritingTransformer do a much more thorough job, as I repeated in another recent thread, but I wouldn't want it to slow down the display of pages. Having a link management capability would be the proper solution, of course (so that when you delete or move a document then you could just "look up" the list of documents that pointed to it), but that would be very hard to do without a relational database. > > This change would be a performance boost for every page. > > Or am I missing something in why it needs to be done at display time? > > > > 4) LinkRewritingTransformer relies heavily on the > > DefaultDocumentBuilder class, whose isDocument() method simplistically > > returns true for any URL's starting like "/lenya/mypub/authoring/" > > even if the URL points to an asset, not a CMS document. In contrast, > > note that the sitemaps verify that the URL ends in ".html" before > > assuming that a URL is really a CMS document. Should > > DefaultDocumentBuilder's isDocument() method be changed to look for > > the ".html" ending? (But do CMS documents *always* have an ".html" > > ending?) > > No, we can't do this. This is another reason why I think that the > DocumentBuilder concept is doomed (see the thread "Mapping URLs to > documents"). This is a quite complex and fundamental issue, IMO we > have to come to a decision here first. > > Thanks for bringing this up! > > -- Andreas > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
