hi viraf, thanks for your mail.
Has anyone built an application similar to that described above? What version of Jackrabbit was used, and what were the issues that you ran into. How much meta-data did a node carry, what was the average depth of a leaf node, and how many nodes did you have in the implementation before performance became an issue.
we built a digital asset management application that sounds very similar to what you are describing. the meta information varies from filetype to filetype but ranges on average between 10 and 50 properties per nt:resource instance. in addition to typical meta information we also store a number of thumbnail images in the content repository for every asset.
I am considering on building a cluster of servers providing repository services. Can the repository be clustered ? (a load balancer in front of the repository will distribute requests to a pool of repository servers.).
yes, jackrabbit can be clustered. i would recommend though to run the repository with model 1 or model 2 [1] and just use the load balancer on top of your application. this avoids the overhead of remoting all together and still provides you with clustering. [1] http://jackrabbit.apache.org/doc/deploy.html
How does the repository scale? can it handle > 50Million artifacts (if the artifacts are placed on the file system does Alfresco manage the directory structure or are all files placed in a single directory)
assuming that you mean "jackrabbit"... ;) we ran tests beyond 50m files and yes jackrabbit manages the filesystem if the filesystem is chosen as the persistence layer for blobs.
Is there support for auditing access to documents ?
this could easily be achieved with a decoration layer.
Is there support for defining archival / retention policies?
jackrabbit certainly offers the hooks to build recordsmanagment but does not come with ootb archival or retention facilties.
Is there support for backups ?
for the most convenient backup i would recommend to persist the entire content repository in an rdbms and use the rdbms features for backup. regards, david