Hi,

On 5/16/07, Jukka Zitting <[EMAIL PROTECTED]> wrote:
On 5/12/07, Jukka Zitting <[EMAIL PROTECTED]> wrote:
> Based on the feedback I agree that it probably doesn't make sense to
> keep track of unique copies of all values. However, avoiding extra
> copies of large binaries is still a very nice feature, so I'd still
> like to keep the single copy idea for those values. This is in fact
> something that we might want to consider already for Jackrabbit 1.4
> regardless of what we'll do with the NGP proposal.

See JCR-926 for a practical application of this idea to current Jackrabbit.

I just did a quick prototype where I made the InternalValue class turn
all incoming binary streams into data records using a global data
store. Internally the value would just be represented by the data
identifier.

This allowed me to simplify quite a few things (for example to drop
all BLOBStore classes and custom handling of binary properties) and to
achieve *major* performance improvements for cases where large (>
100kB) binaries are handled. For example the time to save a large file
was essentially cut in half and things like versioning or cloning
trees with large binaries would easily become faster by an order of
magnitude. With this change it is possible for example to copy a DVD
image file in milliseconds. What's even better, not only did this
change remove extra copying of binary values, it also pushed all
binaries out of the persistence or item state managers so that no
binary read or write operation would ever lock the repository!

The downside of the change is that it requires backwards-incompatible
changes in jackrabbit-core, most notably pulling all blob handling out
of the existing persistence managers. Adopting the data store concept
would thus require migration of all existing repositories. Luckily
such migration would likely be relatively straightforward and we could
write tools to simplify the upgrade, but it would still be a major
undertaking.

I would very much like to go forward with this approach, but I'm not
sure when would be the right time to do that. Should we target already
the 1.4 release in September/October, or would it be better to wait
for Jackrabbit 2.0 sometime next year? Alternatively, should we go for
a 2.0 release already this year with this and some other structural
changes, and have Jackrabbit 3.0 be the JSR 283 reference
impelementation?

BR,

Jukka Zitting

Reply via email to