On Wed, Oct 7, 2009 at 20:34, Ian Boston <i...@tfd.co.uk> wrote:
> I agree, I would like to adopt sensible naming, but we keep on hitting
> situations where even with the most reasonable domain prefix we end up with
>> 2K items in a folder and then the update rates go through the floor, and
> contention and un mergable changes fall over. (usually just at the worst
> time possible... when load is highest )
>
> In our case we often run out of things to slice before we reach a position
> where the store works. eg ieb i/ie/ieb  gives 64 at level 1 which generates
> huge amounts of collision at level2 which again only has 64 making the
> maximum scale of somewhere around 4096*1024 items assuming a perfect
> distribution before the bottom level folders breach 1024 children. For
> messaging for instance, I need a store that does about > 255^3 before
> colliding, ie 16M *1024.  Am I wrong to be choosing jcr as a message store
> to support this use case ?

I think you are really at the edge of scaling here. How many messages
are added per day? I'd think that date + maybe time (if there are more
than 2K per day) should balance it enough, for example. Organizing
messages by date is probably the best way anyway. And I guess they
won't change at all, only new ones are added, which also should reduce
contention to the node with the current time.

If there is some other categorization of messages, eg. like the
project or group or whatever they belong to, you can put them in the
project's folder and then do the substructure via the dates. If you
give the messages a nodetype + other metadata as properties, you can
search them across projects or months/years.

> Sounds like if JCR-642 was fixed, none of this would be an issue?

Not really. First of all it's not just a "fix", it requires a complete
rewrite of the internal persistence architecture in Jackrabbit.
Something for a 3.0 maybe (and there are various ideas how to do that
and also improve other bottlenecks).

But even if Jackrabbit scales with hundred thousands of child nodes
per node, you still have the problem of an unbalanced tree: it will be
hard or not to say impossible to browse that tree for a human - you'd
need a very advanced paging tree view to be able to go through that)
and just doesn't "feel" right. Well, at least to me ;-)

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetsc...@day.com

Reply via email to