Hi, Thanks for the comments so far on this topic! I've been thinking about this a bit more and I now have the second iteration ready for review. Read on...
Based on the feedback I agree that it probably doesn't make sense to keep track of unique copies of all values. However, avoiding extra copies of large binaries is still a very nice feature, so I'd still like to keep the single copy idea for those values. This is in fact something that we might want to consider already for Jackrabbit 1.4 regardless of what we'll do with the NGP proposal. The idea is to keep all binary values (I guess it's easier to manage things by value type than by value size) in a global binary store that keeps only a single copy of any unique binary stream. Binary values are stored in the global store as soon as they are received from the client (for example ValueFactory.createValue(InputStream)) and only a resulting value identifier is kept as a reference to the binary stream. The binary store persists all received values immediately and never modifies or removes (unless there's an explicit garbage collection process) stored binaries. This allows the binary store to exist outside any transaction scopes, and it can also be concurrently accessed by any number of cluster nodes or other processes. Even completely separate content repositories could share the binary store. BR, Jukka Zitting