On Wed, Oct 7, 2009 at 20:34, Ian Boston <i...@tfd.co.uk> wrote: > I agree, I would like to adopt sensible naming, but we keep on hitting > situations where even with the most reasonable domain prefix we end up with >> 2K items in a folder and then the update rates go through the floor, and > contention and un mergable changes fall over. (usually just at the worst > time possible... when load is highest ) > > In our case we often run out of things to slice before we reach a position > where the store works. eg ieb i/ie/ieb gives 64 at level 1 which generates > huge amounts of collision at level2 which again only has 64 making the > maximum scale of somewhere around 4096*1024 items assuming a perfect > distribution before the bottom level folders breach 1024 children. For > messaging for instance, I need a store that does about > 255^3 before > colliding, ie 16M *1024. Am I wrong to be choosing jcr as a message store > to support this use case ?
I think you are really at the edge of scaling here. How many messages are added per day? I'd think that date + maybe time (if there are more than 2K per day) should balance it enough, for example. Organizing messages by date is probably the best way anyway. And I guess they won't change at all, only new ones are added, which also should reduce contention to the node with the current time. If there is some other categorization of messages, eg. like the project or group or whatever they belong to, you can put them in the project's folder and then do the substructure via the dates. If you give the messages a nodetype + other metadata as properties, you can search them across projects or months/years. > Sounds like if JCR-642 was fixed, none of this would be an issue? Not really. First of all it's not just a "fix", it requires a complete rewrite of the internal persistence architecture in Jackrabbit. Something for a 3.0 maybe (and there are various ideas how to do that and also improve other bottlenecks). But even if Jackrabbit scales with hundred thousands of child nodes per node, you still have the problem of an unbalanced tree: it will be hard or not to say impossible to browse that tree for a human - you'd need a very advanced paging tree view to be able to go through that) and just doesn't "feel" right. Well, at least to me ;-) Regards, Alex -- Alexander Klimetschek alexander.klimetsc...@day.com