On 7 Oct 2009, at 18:48, Alexander Klimetschek wrote:
On Wed, Oct 7, 2009 at 18:10, Ian Boston <i...@tfd.co.uk> wrote:
so if the abstraction and isolation is perfect and the hashed and
ugly jcr
path never exposed to a developer or user above the layer of
service or api,
then using them in the JCR itself is unfortunate but acceptable ?
I'd say the "layer of service or api" is the JCR API and that's
exposed to developers. With Sling, this is certainly the case.
Ahh,
Many of the developers we work with only see the URLs, they are UI
developers working in HTML/Javascript and Java developers accessing
this area work through ServiceAPI's that abstract.
It would obviously be desirable not to have to resort to hashing,
but there
are cases as soon as a system has more than a few 10K users.
Yes, but I am just saying that a somehow senseful naming is preferred
over arbitrary hashes. Dates like 2009/12/01 or nodename prefixes
"a/ad/admin" or more domain specific categorization.
I agree, I would like to adopt sensible naming, but we keep on hitting
situations where even with the most reasonable domain prefix we end up
with > 2K items in a folder and then the update rates go through the
floor, and contention and un mergable changes fall over. (usually just
at the worst time possible... when load is highest )
In our case we often run out of things to slice before we reach a
position where the store works. eg ieb i/ie/ieb gives 64 at level 1
which generates huge amounts of collision at level2 which again only
has 64 making the maximum scale of somewhere around 4096*1024 items
assuming a perfect distribution before the bottom level folders breach
1024 children. For messaging for instance, I need a store that does
about > 255^3 before colliding, ie 16M *1024. Am I wrong to be
choosing jcr as a message store to support this use case ?
Longer term, looking at the storage of child nodes relative to
parents in
Jackrabbit itself *might* address this. You mention the Persistance
Manager.
Are there PM's that dont have the problem or is it above the PM
layer ?
I was just using this as an analogy; it affects most persistence
managers, but especially the optimized bundle pms. They store nodes by
uuid and as a binary bundle in the database, so accessing the database
(for doing JCR workarounds for migration, large-style copying or
whatever) is not anything that really works because you cannot browse
it without additional programming help. But every now people on the
Jackrabbit list, that are new to JR, ask for that: how can I modify
the nodes in the db, etc. That's because they want to reuse their
experience with databases and all the admin tools available.
Now with JCR, if you have a JCR-level browser and admin tool, you
don't need it. And the PM is just an implementation detail. So IMO
this is a good thing - one that gives you the unstructuredness. But
above that level you don't want to introduce such a complex mapping so
that people have no way to use the repository as a fundamental
infrastructure.
Sounds like if JCR-642 was fixed, none of this would be an issue?
Ian
Regards,
Alex
--
Alexander Klimetschek
alexander.klimetsc...@day.com