Re: Considering JCR again

solprovider Mon, 30 Jun 2008 08:09:21 -0700

We are creating very long posts.  I am cutting as much as possible
without losing context.  Removed sections are because I agree with (or
at least have no additions to) Andreas' remarks.  Please read the
parent first.

In case my preference is not obvious (since much of my writing is
pragmatic rather than agenda-based), I want JCR.  I believe JCR will
improve Lenya's performance for "large" Publications, especially as
the technology matures.  I also believe maintaining the XML-files
repository is important to greatly reduce the Lenya's learning curve
and barrier of entry for less technical people.

On 6/30/08, Andreas Hartmann <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] schrieb:
> > JCR would eliminate #1: XML file-based repository = transparent datastore.
>  With "eliminate", do you mean that the feature is removed or that the
> feature is moved out of the scope of the project? IMO only the latter
> applies. IMO a JCR repository becomes more visible, since you can access it
> through a standardized interface. There are plug-ins for Eclipse to browse
> and manipulate JCR repositories, you can mount it via WebDAV, there are
> web-based JCR browsers. IMO this provides a better visibility than our
> proprietary repository (which only offers a Java API that's not nearly as
> well documented as JCR).
>
>  But of course I have to admit that this only applies if you can treat the
> repository as a black box which works as expected. Otherwise you'll have to
> resort to the maintainers of the repository implementation. Which in our
> case could be another Apache project, which is IMO quite a fortunate
> situation.

Every DBA will tell you that RDBMSes are transparent because software
exists to display the tables like spreadsheets and run SQL commands.
If you do not already understand relational databases and SQL, the
tools are useless.  My definition of "transparent" was based on Lenya
1.2 -- the data is stored in text files so someone without knowledge
of JCR, databases, content management, or even XML could see how the
data is being stored.  The easiest method to learn XML and Lenya is to
read the files.  A JCR backend loses this "transparency".

> > Lenya 1.3 uses labels as in "Labelled Versions" from:
> >   http://wiki.apache.org/lenya/JcrContentModelAreas
> > The other options trigger my "bad design" alert system.  I can explain
> > at length if nobody is emotionally attached to the other systems.  The
> > rest of this post assumes a revision/label approach rather than using
> > Areas.
>  Maybe you'd like to reply to this post?
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg05812.html

The post states that JCR1 can only specify access control at the
workspace level so the "area" (multiple workspaces) approach is
necessary if Lenya uses JCR's security and does not need security
below the "area" level.  The post also states this limitation has been
removed for JCR2.  JCR1 does not have the flexibility to handle
document-level security.  For Lenya to have a good security model,
Lenya will need to control the security.  A Lenya-integrated security
system is already needed for Lenya to maintain the XML-file
repository.  This also means WebDAV can bypass and possibly corrupt
Lenya's security on JCR1.

According to the FAQ for JSR-170, every security model CAN be
implemented with JCR1.  All other comments about JCR1 in this post
refer to Jackrabbit -- the reference implementation.  As the reference
implementation, Jackrabbit implements the minimal functionality
required to satisfy JSR-170.  These minimal requirements DO NOT
INCLUDE MATURE SECURITY FUNCTIONALITY.  Lenya could implement a decent
security model using JCR1, but the resulting platform would no longer
be Jackrabbit and could not be marketed as working on all JCR1
backends.

> > 2. Use JCR's security.  Lenya 1.3 will allow different access control
> > for the same document accessed through different Publications.  I am
> > uncertain if this can be implemented with multiple AccessManagers or
> > requires special naming convention e.g. "mypub-username".
> > Also (from http://wiki.apache.org/lenya/JcrAccessControl ):
> >   "Can only grant permissions, not revoke them"
> > may require thought.  What happens when the security changes in Lenya?
> >  Do we discard the current AccessManager and reassign access to all
> > documents; hopefully not a major problem but requires thought and may
> > be a performance issue.
>  That is definitely an issue that has to be thoroughly considered. My
> current opinion is that we should use the standard JCR methods to discover
> access control settings, but provide our own methods to define and store
> them. I'm not sure yet, though.

JCR1 is not sufficient for Lenya.  We should wait for JCR2 or
implement all security within Lenya.  We could (should) implement
security below the Content API so repository-specific security can be
used when available.

> > Points of uncertainty:
> > 1. Do we have a node for each Resource and subnodes for each
> > Translation? Or is each Translation a node and we need something else
> > to relate the Translations to the Resource?    I will assume the
> > former for the rest of this post (since the latter requires much more
> > thought and work.)
>  This needs to be decided. I have outlined some possible approaches:
>  http://wiki.apache.org/lenya/JcrContentModelTranslations

Lenya 1.2 and 1.3 use the Document/Translation model.  This is a key
feature of Lenya.  Without this model, no reason exists not to have
separate Publications for each language.
Publication/Translation/documents, Translation/Publication/documents,
and OneLanguagePublication/documents are functionally equivalent since
documents of different languages are not necessarily related.  Any CMS
can handle those alternatives; Lenya is special because Translations
are below the document level.

> > 2. Named Resources (Content)
> > Mostly migration and synchronization issues. Can we use JCR's UUIDs
> > without losing functionality?
>  That's a fundamental and important question. We can't use them the same way
> as in Lenya 2.0 because in Lenya UUIDs and translations are orthogonal, but
> maybe it makes sense to reconsider this concept.

JCR can easily handle the Resource/Translation data model used in
Lenya 1.3.  Some work may be needed if 2.0 chose a different approach.

> > The repository should be transparent to
> > Lenya.  We should be able to import/export between any two
> > repositories without notice.
>  +1, that's an important requirement.

Do you mean only between any two JCR repositories?  Or are you
allowing for other repositories e.g. the XML-text repository?

> > Best is if we can synchronize between
> > any two repositories.  JCR's UUID is sufficient for documents created
> > in JCR, but documents created elsewhere will need their Lenya UNID for
> > synchronization.  Using the Lenya UNID for all purposes may be easier
> > than using the JCR UUID for special cases.  How does this affect
> > "Automatic referential integrity checks"?
>  A possible solution (has its drawbacks, though):
>  - Allow same-name siblings
>  - Use UUIDs for internal links
>
>  This way, you could merge
>  /foo
>   /bar [uuid=1]
>  /foo
>   /bar [uuid=2]
>  TO
>  /foo
>   /bar [uuid=1]
>   /bar [uuid=2]
>  This way, we could at least merge first and clean up the URL space later.

I am uncertain we are using the same concepts.  Are "foo" and "bar"
representing parent-child identifiers for URL creation?  Lenya 1.3
uses Structures and Indexes for that functionality -- removing the
parent-child relationship from document data.  I believe Lenya 1.3's
data storage can be directly implemented in JCR; JCR's UUIDs can be
ignored.

> > What are we losing by not using JCR's functionality?  See
> > http://wiki.apache.org/lenya/JcrContentModel
> >
> > 3. Named Resource (Design) - Design Resources require names.  From the
> > example code, this should not be an issue.  Again, what do we lose by
> > not using JCR's "Automatic referential integrity checks"?
>  Here's a statement against references:
> http://wiki.apache.org/jackrabbit/DavidsModel#head-ed794ec9f4f716b3e53548be6dd91b23e5dd3f3a

I have great respect for people writing standards (JSR, W3C, ISO,
etc.) and understand that people with real technological experience do
not participate.  As CTO for Day Software, David Nuescheler was the
Maintenance Lead for JSR-170.  (Surprising to me) I agree with his
entire article.  David is trying to express concepts that I have lived
for more than a decade.  XML databases are new, and people coming from
the relational database paradigm are struggling to discover "best
practices".  Domino is a document-based database system almost two
decades old.  While not quite as flexible as a pure XML database
should be, Domino is extremely close.  Unfortunately, few good
technologists understand Domino.  Experience with Domino teaches the
same lessons.  Domino has "referential integrity" functionality with
parent-child relationships.  I just built the primary accounting
system of a company without using the standard parent-child
relationships for reasons similar to David's -- the system is more
flexible and stable not using the platform's "referential integrity
checks."

David also states, "Workspaces should not be used for access control."
 An external vote not to use "areas" just because JCR1 is immature.

>  IMO it depends on what we use the references for. I wouldn't use them for
> internal links (because this would prevent all broken links, which we
> currently handle by removing them from the live area). A valid usage
> scenario would be a link from a translation to a node containing the
> language-independent meta data, which can't be removed unless all
> translations are removed.

This should not be an issue if the nodes are
Resource/Translation/Revisions.  The Resource cannot be removed while
it contains Translations; conversely, removing a Resource must remove
all Translations.

> > [WebDAV editing] Just because someone can
> > create a new Revision should not automatically allow the person to
> > publish it; does JCR allow a document to be edited without granting
> > the ability to assign labels?
>  AFAIK this is completely separated. Assigning labels is actually an
> operation on the version history of a node. It is therefore somewhat outside
> the "normal" session-based content management, I'm not sure about the access
> control implications yet.

Labels are implemented as assignments to the VersionHistory nodes.
Lenya 1.3's Translation node would be implemented as a VersionHistory.
 As these nodes must be contained in any tree that manipulated the
subnodes (Revisions), the VersionHistory would be subject to the same
lack of security as everything else in JCR1.  Answering my own
question, JCR1 does not allow edit rights to a document without also
allowing the editor to assign labels.  This applies even with the
workspace/area security model.

> > (My last post explains why I feel keeping the file-based repository is
> > important, but the parent post implies this is an either/or decision
> > rather than additional functionality.)
>  If we keep the file-based repository, we either have to limit the JCR
> integration to the features offered by the file-based implementations (for
> instance, what about SQL and transactions?) or we'd have to introduce a
> concept of repository compliance. The latter option would for example mean
> to throw an OperationNotSupportedException if a user issues an SQL query on
> the file system repository.
>  To me, both of these options seem to be a serious limitation. Particularly,
> I consider transaction support as crucial (especially since we almost lost a
> potential customer because of the lack of this feature).
>  -- Andreas

My opinion:
- Lenya should use the Content API to talk to all repositories.
- Lenya should not implement anything that cannot be handled by the Content API.
- Lenya should have transactions -- the ability to rollback edits to
several documents if the transaction is not completed.  The easy
solution is not to publish Revisions created in an uncompleted
transaction.  The full solution is to remove any Revisions created in
an uncompleted transaction.  Both should be simple to implement
without relying on any repository-specific functions.  If the rollback
function is built into the Content API, Lenya can use
repository-specific code if a particular repository natively handles
transactions, but the functionality must be available for all
repositories.
-  Lenya should not provide an SQL interface or use any SQL features.
As mentioned earlier, many other tools exist for interfacing with JCR
and other databases.  Using a JCR repository for Lenya allows
companies to use those tools when required, but Lenya should not
provide functions dependent on specific backends.

Think about SAP on Oracle.  SAP (the company) gets nervous when
companies use SQL tools on the backend databases.  A potential job
would be merging SAP data for two companies that merged, not a normal
SAP function.  I would use extreme care to not corrupt SAP's
applications while using SQL tools to manipulate the backend.  Lenya
should be equated to SAP -- an application running on multiple
backends with external tools available for manipulating the backends
but no guarantees from Lenya if using those tools corrupts anything.

The alternative is:
- Lenya locks to certain repositories, probably just an extremely
customized Jackrabbit.
- Lenya provides tools for those repositories.  Do we want to provide
an SQL front-end?  Can we provide better functionality than the
dedicated tools?

solprovider

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Considering JCR again

Reply via email to