On 6/6/07, Stefan Guggisberg <[EMAIL PROTECTED]> wrote:
hi jukka,
On 6/5/07, Jukka Zitting <[EMAIL PROTECTED]> wrote:
> Hi,
>
> On 5/16/07, Jukka Zitting <[EMAIL PROTECTED]> wrote:
> > On 5/12/07, Jukka Zitting <[EMAIL PROTECTED]> wrote:
> > > Based on the feedback I agree that it probably doesn't make sense to
> > > keep track of unique copies of all values. However, avoiding extra
> > > copies of large binaries is still a very nice feature, so I'd still
> > > like to keep the single copy idea for those values. This is in fact
> > > something that we might want to consider already for Jackrabbit 1.4
> > > regardless of what we'll do with the NGP proposal.
> >
> > See JCR-926 for a practical application of this idea to current Jackrabbit.
>
> I just did a quick prototype where I made the InternalValue class turn
> all incoming binary streams into data records using a global data
> store. Internally the value would just be represented by the data
> identifier.
>
> This allowed me to simplify quite a few things (for example to drop
> all BLOBStore classes and custom handling of binary properties) and to
> achieve *major* performance improvements for cases where large (>
> 100kB) binaries are handled. For example the time to save a large file
> was essentially cut in half and things like versioning or cloning
> trees with large binaries would easily become faster by an order of
> magnitude. With this change it is possible for example to copy a DVD
> image file in milliseconds. What's even better, not only did this
> change remove extra copying of binary values, it also pushed all
> binaries out of the persistence or item state managers so that no
> binary read or write operation would ever lock the repository!
awesome, that's great news!
is there a way to purge the binary store, i.e. remove unreferenced data?
i am a bit concerned that doing a lot of add/remove operations would
quickly exhaust available storage space. at least we need a concept
how deal with this kind of situation.
something that just crossed my mind: i know a number of people
want to store everything (config, meta data, binaries and content)
in the same db in order to allow easy backup/restore of an entire
repository. currently they can do so by using DatabaseFileSystem
and the externalBLOBs=false option of DatabasePersistenceManager.
do you plan to support db persistence for the binary store as well?
cheers
stefan
>
> The downside of the change is that it requires backwards-incompatible
> changes in jackrabbit-core, most notably pulling all blob handling out
> of the existing persistence managers. Adopting the data store concept
> would thus require migration of all existing repositories. Luckily
> such migration would likely be relatively straightforward and we could
> write tools to simplify the upgrade, but it would still be a major
> undertaking.
>
> I would very much like to go forward with this approach, but I'm not
> sure when would be the right time to do that. Should we target already
> the 1.4 release in September/October, or would it be better to wait
> for Jackrabbit 2.0 sometime next year? Alternatively, should we go for
> a 2.0 release already this year with this and some other structural
> changes, and have Jackrabbit 3.0 be the JSR 283 reference
> impelementation?
since the jsr-283 public review is just around the corner we'll have to
start work on the ri pretty soon. therefore i think the ri should target
v2.0.
wrt intergating JCR-926 both 1.4 and 2.0 would be fine with me.
cheers
stefan
>
> BR,
>
> Jukka Zitting
>