Hi, Here's my estimates for the Jackrabbit 3 time scale based on the deployments we see at Day.
On Tue, Feb 9, 2010 at 4:55 PM, Jukka Zitting <[email protected]> wrote: > Scalability: > * How much content (number of documents/nodes, raw amount data in > GB/TB/PB) do you have in the repository? Up to hundreds of millions of documents, a few PBs of content. Most repositories will be significantly smaller, typically up to a million documents and a TB of content, though many of them need to be prepared for at least one order of magnitude of growth. > * How many (concurrent) users (readers/editors/administrators) does > your repository have? Up to thousands of concurrent readers, a few dozen concurrent editors and a handful of admins. An increasing number of deployments will shift from read-only to mostly-read as online participation features become more actively used. > * Do you need Internet-scale (millions of users or exabytes of > content) features? No. > Deployment: > * Do you run the repository on a single server, on a cluster or in the cloud? Mostly small clusters, increasingly in the cloud. Reaching down to embedded devices might be an interesting development, but currently I don't see that on the roadmap. > * How many and how powerful servers do you use for the repository? The cloud deployments could reach up to a few dozen servers per cluster, though most deployments will likely need just a handful of servers. Each server will likely have 8+ cores, dozens of GBs of RAM, a few TBs of local disk space, and a gigabit network connection to other cluster nodes. > Content model: > * Do you need support for flat content hierarchies (>>10k sibling nodes)? That would be quite nice, though we can live without it if needed. > * Do you need support for same-name siblings? Only in some special cases. > * If you use versioning, how actively (commit on all saves / commit > only at major milestones) and for what purpose (revision history, > backup, etc.) do you use it? Mostly to record important steps in document workflows. Usually not used for update/merge across workspaces. > * How granular (hierarchies of small properties vs. big binary blobs) > is your content? Fairly granular, most nodes will have up to a dozen properties, few of them larger than a few kilobytes. Larger binaries are used mostly for storing things (images, videos, generic files, etc.) that are modified outside the scope of our content applications. > * How much of your content access is based on search / tree traversal > / following references? Mostly tree traversal, some search, hardly any references. > * How much you rely on the repository to enforce your content model > (node type constraints, etc.)? Not much, most of our constraints are application-based and not strictly enforced. > * How often you modify your content model (and/or related node types)? Not too often (once per year?). Content model upgrades are handled by specific upgrade scripts. > Features: > * Do you need full ACID semantics? Is an "eventually consistent" > system good enough for you? Full ACID is probably not needed, except perhaps in some specific deployments. Eventual consistency might be good enough for a majority of our repositories, but that needs to be verified. > * Do you need more powerful search features than what we now have? No pressing need, though every know and then we face requests for specific incremental search improvements. > * How important is observation to your application? Do you need > trigger-like capability that can modify or reject a save() operation? We use observation quite a lot. I think the journaled observation features in 2.0 should already cover most of our feature needs in this space, and I don't think we need database-style triggers for now (though they might come in handy for specific use cases). BR, Jukka Zitting
