We are creating very long posts. I am cutting as much as possible without losing context. Removed sections are because I agree with (or at least have no additions to) Andreas' remarks. Please read the parent first.
In case my preference is not obvious (since much of my writing is pragmatic rather than agenda-based), I want JCR. I believe JCR will improve Lenya's performance for "large" Publications, especially as the technology matures. I also believe maintaining the XML-files repository is important to greatly reduce the Lenya's learning curve and barrier of entry for less technical people. On 6/30/08, Andreas Hartmann <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] schrieb: > > JCR would eliminate #1: XML file-based repository = transparent datastore. > With "eliminate", do you mean that the feature is removed or that the > feature is moved out of the scope of the project? IMO only the latter > applies. IMO a JCR repository becomes more visible, since you can access it > through a standardized interface. There are plug-ins for Eclipse to browse > and manipulate JCR repositories, you can mount it via WebDAV, there are > web-based JCR browsers. IMO this provides a better visibility than our > proprietary repository (which only offers a Java API that's not nearly as > well documented as JCR). > > But of course I have to admit that this only applies if you can treat the > repository as a black box which works as expected. Otherwise you'll have to > resort to the maintainers of the repository implementation. Which in our > case could be another Apache project, which is IMO quite a fortunate > situation. Every DBA will tell you that RDBMSes are transparent because software exists to display the tables like spreadsheets and run SQL commands. If you do not already understand relational databases and SQL, the tools are useless. My definition of "transparent" was based on Lenya 1.2 -- the data is stored in text files so someone without knowledge of JCR, databases, content management, or even XML could see how the data is being stored. The easiest method to learn XML and Lenya is to read the files. A JCR backend loses this "transparency". > > Lenya 1.3 uses labels as in "Labelled Versions" from: > > http://wiki.apache.org/lenya/JcrContentModelAreas > > The other options trigger my "bad design" alert system. I can explain > > at length if nobody is emotionally attached to the other systems. The > > rest of this post assumes a revision/label approach rather than using > > Areas. > Maybe you'd like to reply to this post? > http://www.mail-archive.com/[EMAIL PROTECTED]/msg05812.html The post states that JCR1 can only specify access control at the workspace level so the "area" (multiple workspaces) approach is necessary if Lenya uses JCR's security and does not need security below the "area" level. The post also states this limitation has been removed for JCR2. JCR1 does not have the flexibility to handle document-level security. For Lenya to have a good security model, Lenya will need to control the security. A Lenya-integrated security system is already needed for Lenya to maintain the XML-file repository. This also means WebDAV can bypass and possibly corrupt Lenya's security on JCR1. According to the FAQ for JSR-170, every security model CAN be implemented with JCR1. All other comments about JCR1 in this post refer to Jackrabbit -- the reference implementation. As the reference implementation, Jackrabbit implements the minimal functionality required to satisfy JSR-170. These minimal requirements DO NOT INCLUDE MATURE SECURITY FUNCTIONALITY. Lenya could implement a decent security model using JCR1, but the resulting platform would no longer be Jackrabbit and could not be marketed as working on all JCR1 backends. > > 2. Use JCR's security. Lenya 1.3 will allow different access control > > for the same document accessed through different Publications. I am > > uncertain if this can be implemented with multiple AccessManagers or > > requires special naming convention e.g. "mypub-username". > > Also (from http://wiki.apache.org/lenya/JcrAccessControl ): > > "Can only grant permissions, not revoke them" > > may require thought. What happens when the security changes in Lenya? > > Do we discard the current AccessManager and reassign access to all > > documents; hopefully not a major problem but requires thought and may > > be a performance issue. > That is definitely an issue that has to be thoroughly considered. My > current opinion is that we should use the standard JCR methods to discover > access control settings, but provide our own methods to define and store > them. I'm not sure yet, though. JCR1 is not sufficient for Lenya. We should wait for JCR2 or implement all security within Lenya. We could (should) implement security below the Content API so repository-specific security can be used when available. > > Points of uncertainty: > > 1. Do we have a node for each Resource and subnodes for each > > Translation? Or is each Translation a node and we need something else > > to relate the Translations to the Resource? I will assume the > > former for the rest of this post (since the latter requires much more > > thought and work.) > This needs to be decided. I have outlined some possible approaches: > http://wiki.apache.org/lenya/JcrContentModelTranslations Lenya 1.2 and 1.3 use the Document/Translation model. This is a key feature of Lenya. Without this model, no reason exists not to have separate Publications for each language. Publication/Translation/documents, Translation/Publication/documents, and OneLanguagePublication/documents are functionally equivalent since documents of different languages are not necessarily related. Any CMS can handle those alternatives; Lenya is special because Translations are below the document level. > > 2. Named Resources (Content) > > Mostly migration and synchronization issues. Can we use JCR's UUIDs > > without losing functionality? > That's a fundamental and important question. We can't use them the same way > as in Lenya 2.0 because in Lenya UUIDs and translations are orthogonal, but > maybe it makes sense to reconsider this concept. JCR can easily handle the Resource/Translation data model used in Lenya 1.3. Some work may be needed if 2.0 chose a different approach. > > The repository should be transparent to > > Lenya. We should be able to import/export between any two > > repositories without notice. > +1, that's an important requirement. Do you mean only between any two JCR repositories? Or are you allowing for other repositories e.g. the XML-text repository? > > Best is if we can synchronize between > > any two repositories. JCR's UUID is sufficient for documents created > > in JCR, but documents created elsewhere will need their Lenya UNID for > > synchronization. Using the Lenya UNID for all purposes may be easier > > than using the JCR UUID for special cases. How does this affect > > "Automatic referential integrity checks"? > A possible solution (has its drawbacks, though): > - Allow same-name siblings > - Use UUIDs for internal links > > This way, you could merge > /foo > /bar [uuid=1] > /foo > /bar [uuid=2] > TO > /foo > /bar [uuid=1] > /bar [uuid=2] > This way, we could at least merge first and clean up the URL space later. I am uncertain we are using the same concepts. Are "foo" and "bar" representing parent-child identifiers for URL creation? Lenya 1.3 uses Structures and Indexes for that functionality -- removing the parent-child relationship from document data. I believe Lenya 1.3's data storage can be directly implemented in JCR; JCR's UUIDs can be ignored. > > What are we losing by not using JCR's functionality? See > > http://wiki.apache.org/lenya/JcrContentModel > > > > 3. Named Resource (Design) - Design Resources require names. From the > > example code, this should not be an issue. Again, what do we lose by > > not using JCR's "Automatic referential integrity checks"? > Here's a statement against references: > http://wiki.apache.org/jackrabbit/DavidsModel#head-ed794ec9f4f716b3e53548be6dd91b23e5dd3f3a I have great respect for people writing standards (JSR, W3C, ISO, etc.) and understand that people with real technological experience do not participate. As CTO for Day Software, David Nuescheler was the Maintenance Lead for JSR-170. (Surprising to me) I agree with his entire article. David is trying to express concepts that I have lived for more than a decade. XML databases are new, and people coming from the relational database paradigm are struggling to discover "best practices". Domino is a document-based database system almost two decades old. While not quite as flexible as a pure XML database should be, Domino is extremely close. Unfortunately, few good technologists understand Domino. Experience with Domino teaches the same lessons. Domino has "referential integrity" functionality with parent-child relationships. I just built the primary accounting system of a company without using the standard parent-child relationships for reasons similar to David's -- the system is more flexible and stable not using the platform's "referential integrity checks." David also states, "Workspaces should not be used for access control." An external vote not to use "areas" just because JCR1 is immature. > IMO it depends on what we use the references for. I wouldn't use them for > internal links (because this would prevent all broken links, which we > currently handle by removing them from the live area). A valid usage > scenario would be a link from a translation to a node containing the > language-independent meta data, which can't be removed unless all > translations are removed. This should not be an issue if the nodes are Resource/Translation/Revisions. The Resource cannot be removed while it contains Translations; conversely, removing a Resource must remove all Translations. > > [WebDAV editing] Just because someone can > > create a new Revision should not automatically allow the person to > > publish it; does JCR allow a document to be edited without granting > > the ability to assign labels? > AFAIK this is completely separated. Assigning labels is actually an > operation on the version history of a node. It is therefore somewhat outside > the "normal" session-based content management, I'm not sure about the access > control implications yet. Labels are implemented as assignments to the VersionHistory nodes. Lenya 1.3's Translation node would be implemented as a VersionHistory. As these nodes must be contained in any tree that manipulated the subnodes (Revisions), the VersionHistory would be subject to the same lack of security as everything else in JCR1. Answering my own question, JCR1 does not allow edit rights to a document without also allowing the editor to assign labels. This applies even with the workspace/area security model. > > (My last post explains why I feel keeping the file-based repository is > > important, but the parent post implies this is an either/or decision > > rather than additional functionality.) > If we keep the file-based repository, we either have to limit the JCR > integration to the features offered by the file-based implementations (for > instance, what about SQL and transactions?) or we'd have to introduce a > concept of repository compliance. The latter option would for example mean > to throw an OperationNotSupportedException if a user issues an SQL query on > the file system repository. > To me, both of these options seem to be a serious limitation. Particularly, > I consider transaction support as crucial (especially since we almost lost a > potential customer because of the lack of this feature). > -- Andreas My opinion: - Lenya should use the Content API to talk to all repositories. - Lenya should not implement anything that cannot be handled by the Content API. - Lenya should have transactions -- the ability to rollback edits to several documents if the transaction is not completed. The easy solution is not to publish Revisions created in an uncompleted transaction. The full solution is to remove any Revisions created in an uncompleted transaction. Both should be simple to implement without relying on any repository-specific functions. If the rollback function is built into the Content API, Lenya can use repository-specific code if a particular repository natively handles transactions, but the functionality must be available for all repositories. - Lenya should not provide an SQL interface or use any SQL features. As mentioned earlier, many other tools exist for interfacing with JCR and other databases. Using a JCR repository for Lenya allows companies to use those tools when required, but Lenya should not provide functions dependent on specific backends. Think about SAP on Oracle. SAP (the company) gets nervous when companies use SQL tools on the backend databases. A potential job would be merging SAP data for two companies that merged, not a normal SAP function. I would use extreme care to not corrupt SAP's applications while using SQL tools to manipulate the backend. Lenya should be equated to SAP -- an application running on multiple backends with external tools available for manipulating the backends but no guarantees from Lenya if using those tools corrupts anything. The alternative is: - Lenya locks to certain repositories, probably just an extremely customized Jackrabbit. - Lenya provides tools for those repositories. Do we want to provide an SQL front-end? Can we provide better functionality than the dedicated tools? solprovider --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
