Re: Jackrabbit 3: repository requirements

Jukka Zitting Tue, 09 Feb 2010 08:32:38 -0800

Hi,

Here's my estimates for the Jackrabbit 3 time scale based on the
deployments we see at Day.

On Tue, Feb 9, 2010 at 4:55 PM, Jukka Zitting <[email protected]> wrote:
> Scalability:
> * How much content (number of documents/nodes, raw amount data in
> GB/TB/PB) do you have in the repository?

Up to hundreds of millions of documents, a few PBs of content. Most
repositories will be significantly smaller, typically up to a million
documents and a TB of content, though many of them need to be prepared
for at least one order of magnitude of growth.

> * How many (concurrent) users (readers/editors/administrators) does
> your repository have?

Up to thousands of concurrent readers, a few dozen concurrent editors
and a handful of admins. An increasing number of deployments will
shift from read-only to mostly-read as online participation features
become more actively used.

> * Do you need Internet-scale (millions of users or exabytes of
> content) features?

No.

> Deployment:
> * Do you run the repository on a single server, on a cluster or in the cloud?

Mostly small clusters, increasingly in the cloud. Reaching down to
embedded devices might be an interesting development, but currently I
don't see that on the roadmap.

> * How many and how powerful servers do you use for the repository?

The cloud deployments could reach up to a few dozen servers per
cluster, though most deployments will likely need just a handful of
servers. Each server will likely have 8+ cores, dozens of GBs of RAM,
a few TBs of local disk space, and a gigabit network connection to
other cluster nodes.

> Content model:
> * Do you need support for flat content hierarchies (>>10k sibling nodes)?

That would be quite nice, though we can live without it if needed.

> * Do you need support for same-name siblings?

Only in some special cases.

> * If you use versioning, how actively (commit on all saves / commit
> only at major milestones) and for what purpose (revision history,
> backup, etc.) do you use it?

Mostly to record important steps in document workflows. Usually not
used for update/merge across workspaces.

> * How granular (hierarchies of small properties vs. big binary blobs)
> is your content?

Fairly granular, most nodes will have up to a dozen properties, few of
them larger than a few kilobytes. Larger binaries are used mostly for
storing things (images, videos, generic files, etc.) that are modified
outside the scope of our content applications.

> * How much of your content access is based on search / tree traversal
> / following references?

Mostly tree traversal, some search, hardly any references.

> * How much you rely on the repository to enforce your content model
> (node type constraints, etc.)?

Not much, most of our constraints are application-based and not
strictly enforced.

> * How often you modify your content model (and/or related node types)?

Not too often (once per year?). Content model upgrades are handled by
specific upgrade scripts.

> Features:
> * Do you need full ACID semantics? Is an "eventually consistent"
> system good enough for you?

Full ACID is probably not needed, except perhaps in some specific
deployments. Eventual consistency might be good enough for a majority
of our repositories, but that needs to be verified.

> * Do you need more powerful search features than what we now have?

No pressing need, though every know and then we face requests for
specific incremental search improvements.

> * How important is observation to your application? Do you need
> trigger-like capability that can modify or reject a save() operation?

We use observation quite a lot. I think the journaled observation
features in 2.0 should already cover most of our feature needs in this
space, and I don't think we need database-style triggers for now
(though they might come in handy for specific use cases).

BR,

Jukka Zitting

Re: Jackrabbit 3: repository requirements

Reply via email to