Hi, On Thu, Nov 22, 2012 at 8:15 PM, Alexander Klimetschek <aklim...@adobe.com> wrote: > Why does Jackrabbit/Oak not map JCR hierarchies directly to the filesystem?
As pointed out in the Kafka document, random access over a file system is terribly inefficient which is why splitting finely grained content like what you typically see in a content repository to separate files and directories wouldn't work too well performance-wise with normal file systems. Doing so would also suffer from the other issues you mentioned, most notably lack of atomicity or locking. Instead, and like Kafka also does, storing repository content in big journal or collection files is a pretty good idea. That's what our proprietary TarPM does for Jackrabbit 2.x and you could argue that also the database-backed PMs and Oak MKs are doing the something similar through the database engine. (Also git does this with its pack files.) The main difference to the design as outlined in Kafka is that for various reasons (remote access, etc.) we've had to add various levels of in-memory caching especially in Jackrabbit 2.x. In Oak we've tried to avoid extra caches, and so far only had to add one (that we could perhaps avoid with OAK-468). If we can keep that goal up, and further optimize JSON processing at the MK level, it should be possible also for an Oak stack to work as outlined in the Kafka document. BR, Jukka Zitting