Re: [RT] Workflow and publishing using JCR

solprovider Thu, 08 Dec 2005 11:51:17 -0800

On 12/8/05, Felix Röthenbacher <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> I also think that the language attribute should be implemented as
> a property but if it comes to a language document it should be
> implemented as a separate node.


You agreed with me.  If Language configuration is necessary, the nodes
are probably at the publication level rather than the document level.

> > "versions" could be creation-time-based names for each revision
> > "200512071751.xml", plus "index.xml" or "live.xml" for the currently
> > published version.  "archive" could prefix an "a".  Trash could prefix
> > a "d" (for about to be "deleted").  Publishing is simply overwriting
> > "live.xml" with the desired revision.
> Are you referring to JCR versioning or do you propose to implement
> a versioning system on its own. To me, the latter seems the case
> in your argumentation. If so, can you give us some more details what
> would be the advantages over JCR versioning?

I do not know.  I only started with Jackrabbit and JCR on Saturday,
and have not researched JCR's versioning yet  The paragraph above was
how I wanted Lenya1.2 to handle it.  I hope Jackrabbit improves the
implementation/structure.  As a CMS, Lenya may have additional
requirements than JCR provides.

> > The advantages are every revision of a document is in one place.  The
> > disadvantage was the live version would not be distinct in the file
> > system, which was useful.  That disadvantage does not apply when using
> > Jackrabbit because file operations are very difficult.
> >
> > ===
> > In JCR terms, use properties for the revision nodes:
> > - status = "archived", "deleted", "active", "draft" (useful so someone
> > else does not publish it before it is ready)
> > - created = {time}
> > - edited = {time of last edit}
> > - language (??)
> It seems that the vocabulary is not clearly defined. Are you referring
> to workflow states or to revisions in the sense used in a versioning
> system like SVN or CVS?

My "status" field (property) refers to your "workflow state".  I am
applying it to each "revision" node.  This is where JCR's versions may
not be usable (but I do not know.)  Each   "revision" is marked as:
 -  draft = new, not ready to be published
 -  review = needs review before publishing
 -  active = was once published, may be the live document
 -  archive = obsolete needing archiving
 -  delete = obsolete, archived and ready for deletion
The live document is chosen from the "active" revision nodes by a
property of the parent "document" node.  The live revision must remain
"active"; it cannot be set to "archive" or "delete" until another
"active" revision is set as the "live" one.

Off-topic: Workflow needs easy configuration.  Someone will need to
add a second review "reviewLegal".  Others may want to eliminate the
"archive" step.

Off-topic: Archive should compare the "last full archive" date of the
publication with the date of the revision, and delete (or at least
mark as "ready for deletion" ) the documents already archived.

> > There should also be a history (for each language) at the document
> > node for which revisions were published and when.  The document node
> > should also specify which revision is the live one (unless that node
> > is duplicated with a special identifier.)
> >
> > Combining all this into the simplest structure (N=Node, P=Property,
> > -=1, +=many):
> > N:Content
> > + N:DocumentID
> > - + N:SubdocumentID (same structure as DocumentID)
> > - - N:Document (extra node to ease distunguishing between the
> > Document's Nodes and Nodes for the subdocuments.)
> > - - - P:Type (DocumentType = XHTML...)
> > - - - P:LiveRevision (for each language)
> > - - - P:Status = Draft, Review, Published, Version (published but
> > another version is ready)
> > - - - P:Visibility
> > - - - P:Creator
> > - - - P:Languages Available
> > - - - P:{Other document properties}
> > - - - N:History of publishing for each language
> > - - + N:Revision
> > - - - - P:Status
> > - - - - P:Language
> > - - - - P:Editor
> > - - - - P:{Other revision properties}
> > - - - - N:ContentInformation
> > - - - - - N|P:Document-type-specific fields
> As discussed before, one possibility is to keep the actual content
> separate from any representation (MVC model) meaning that the
> content has a flat hierarchy and the actual hierarchy is built
> on a sitetree which is in a separate place apart from the data.
> Are you referring to the sitetree or the document content?
> Should the information be stored with the document or within
> the sitetree?

I discussed the flat hierarchy later.  My opinion is all information
should be stored with the document.  A special child of the content
node is the "Hierarchy", which contains the tree structure stored in
Lenya1.2's "sitetree.xml".

> > Which properties apply to the document and which apply to the
> > revision?  Title, Navigation Title, Subject, and Description could be
> > either.  They could also be moved to Nodes with their own versioning,
> > but that may be overkill.  Does rollback to a revision also rollback
> > those fields?  Do we need a history of changes to those fields?
> > Should changing the Title create a new revision?  Those answers will
> > determine the best structure.
> I think every modification should result in a new revision.

That is the easiest approach, and I have no objections.  This
paragraph declared that it was  a design choice.  Lenya1.2 maintained
the Navigation Title separate (in sitetree.xml) from the document. I
wanted to everyone to understand this was a change that might warrant
discussion.

> > Lenya 1.2's various sitemap.xml files should be built dynamically from
> > the content.  That may require caching for good performance.
> Do you mean sitetree.xml or sitemap.xmap?

That was a typo.  I meant sitetree.xml.  "sitemap.xmap" is part of
functionality, and of little concern to this discussion about the
datastore structure.

> > ===
> > Another decision is whether subdocuments are Nodes of the document.
> > There has been much discussion about using a unique identifier for
> > every document.  This can be implemented as a minor change to the
> > above.  First, discard the "N:Document" node because it would not be
> > necessary, because there would be no N:SubdocumentID nodes.  Second,
> > add a P:ParentDocumentID.
> >
> > Now use a flat structure:
> > N:Content
> > - N:Hierarchy (created from the Document Nodes, could include
> > information such as Visibility, substitute for sitetree.xml)
> > + N:DocumentUniqueID
> > - - P:ParentDocumentID - only if a subdocument
> > - - P:Path
> > - - Everything under N:Document above
> >
> > This is really good for many reasons:
> > 1. No need to distinguish between content Node and subdocument Nodes.
> > 2. Easy operation on multiple or all documents, such as building the
> > Search index.
> Right.
> > 3. Easy moving of subdocuments. Just change the ParentID, rather than
> > moving Nodes.
>
> Would you like to have UID stored with each document for modeling the
> parent/child relationship? Or would it be better to have a separate
> sitetree?

Yes and yes.
The UID is stored with each document.
The Hierarchy node is stored at the top level of Content, and is
simply a tree of UIDs used to improve performance when using a tree
view such as menus.

The UID is the primary key used internally.  A great advantage of this
approach is the editors' Create Internal Link can display a list of
documents, and store the UID. <link uid="xxx"/>.  For display, the
document is found by UID, and the link replaced by the UrlPath and
Title <a href="urlpath">title</a> (or "Broken Link" if the UID is not
found.)  We could allow the Title to be overridden by the link's
creator.  This allows links to survive documents being moved.  Someone
could create a view where all references to a document are displayed,
so the links can be fixed before the document is deleted.

An easier method for displaying links is an action like
"?openUID=xxx".  This could be implemented first.  It should redirect
browsers to the UrlPath so the browser sees the normal address.  This
would not automatically use the Title from the target document, but it
would still show in the "references" view.

> > 4. Easy creation of flat views.  See all documents by Title, Author,
> > Status, Type.  The Status view will show all documents having versions
> > waiting to be published.  The Type view can show all Employees, even
> > if they are created as subdocuments of Departments.
> > 5. All information is stored in the Document Nodes, so "Visibility"
> > and "Navigation Title" are  available when working with the document.
> >
> > It also imposes additional concerns:
> > 1. The tree of documents (N:Hierarchy) must be maintained for easy
> > transversal.  It should only contain the DocumentUniqueIDs.  Creating
> > sitetree.xml should access the document Nodes for the Visibility
> > property.  This allows creating menus based on other properties.
> > Eventually someone will add individual document security, and menus
> > will be built to show only the documents allowed to each person (based
> > on their name and/or Groups).
> >
> > 2. What happens to orphans?  Subdocuments will not automatically be
> > deleted when the Document Node is deleted.  It would be easy to create
> > a view of documents that have ParentDocumentIDs that no longer exist.
> > Or the Delete process could also delete all subdocuments recursively.
> I think there should be some kind of referential integrity to avoid any
> orphans.

This paragraph suggests there are choices.
1. No orphans allowed.  All child nodes must be deleted or moved
(reassigned to another parent) before a node may be deleted.
1. No orphans allowed.  Deletion of a Node also deletes all child
nodes.  This is probably your "referential integrity".
2. Orphans allowed.  Deletion of a Node marks the child node as
orphaned.  Admins then must decide how to handle each child node.

Best would be for Lenya to allow decision for each document type
during development.
Examples:
1. All Employee documents should be reassigned before deleting a
Department parent document.
2. All Blog children of an Article should be deleted when the Article
is deleted.
3. All Task children of an Employee may be orphaned when the Employee
is terminated.  They will eventually be reassigned or deleted, but
removing the Employee is an urgent task that cannot wait for the
manual reassignment.

I am using "Delete" to mean "set Status to Archive in preparation for
deletion", which should be the usual workflow.  Of course the workflow
may be defined in many ways.

Lenya is a development platform.  More flexibilty allows more
applicability.  All 3 of these choices will be desired by somebody
someday.  I suggest configuring it on a per-child-type basis in the
document-type configuration.  Example:
<DocType name="Department" deleteParentActionDefault="no">
 <Child type="Employee" deleteParentAction="no"/>
 <Child type="Reservation" deleteParentAction="delete">
 <Child type="Project" deleteParentAction="orphan"/>
</DocType>
A Department cannot be deleted if it has Employees or child documents
of Types not mentioned.  Otherwise Reservations are cancelled, and
Projects are orphaned for manual decision.

solprovider

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [RT] Workflow and publishing using JCR

Reply via email to