Hi,

Thanks for the comments so far on this topic! I've been thinking about
this a bit more and I now have the second iteration ready for review.
Read on...

Based on the feedback I agree that it probably doesn't make sense to
keep track of unique copies of all values. However, avoiding extra
copies of large binaries is still a very nice feature, so I'd still
like to keep the single copy idea for those values. This is in fact
something that we might want to consider already for Jackrabbit 1.4
regardless of what we'll do with the NGP proposal.

The idea is to keep all binary values (I guess it's easier to manage
things by value type than by value size) in a global binary store that
keeps only a single copy of any unique binary stream. Binary values
are stored in the global store as soon as they are received from the
client (for example ValueFactory.createValue(InputStream)) and only a
resulting value identifier is kept as a reference to the binary
stream.

The binary store persists all received values immediately and never
modifies or removes (unless there's an explicit garbage collection
process) stored binaries. This allows the binary store to exist
outside any transaction scopes, and it can also be concurrently
accessed by any number of cluster nodes or other processes. Even
completely separate content repositories could share the binary store.

BR,

Jukka Zitting

Reply via email to