Janne Jalkanen wrote:
Congratulations on the new site!

Thanks!

   a. documentation on the outer limits of Priha's abilities,
      particularly throughput performance on extremely large
      numbers of very small XML files (potentially a scale of
      100m records with high speed indexing via XML ID).

Would be interesting to see how fast it becomes... At the moment the speed isn't that bad, and in theory reading should be O(1) [though creation on the FileProvider is O(N) or worse]. But it depends a lot on the DB you have under. There's currently a HSQLDB provider, but tweaking it to run on MySQL should be pretty easy too.

The web service I'm designing will use caching for the high-performance
turnaround we need, so I suppose we don't need the kind of throughput
of an enterprise service for day-to-day operations, but it is expected
that batch loads of records (or XSLT-based modifications of existing
records) should run in a reasonable amount of time for very large
numbers of records (1-10 million). Though it's not expected that these
kinds of changes will be frequent they will likely (at this point) have
to occur during down times. Running on MySQL *might* be a problem for
us since it's not currently an endorsed DB provider (despite the site
being a Sun Microsystems Centre of Excellence and MySQL being owned by
Sun). But we're working on that...

   b. development of a JSR-170 sub-interface so that we have
      sense of what Priha implements.

Mmm... You could just run the JCR TCK and get it from there.

On the question of developing a sub-interface of JCR-170 as a
"guarantee of service", is that a possibility? While designed
as according to JSR-170 Priha is currently non-conformant (so
far as I understand) with the API, so wouldn't it be prudent
to create a sub-interface? Or am I not understanding the
situation? (possible, I've not had much time to get my head
around this yet)

This in addition to my ongoing interest in Priha as a JSPWiki
backend, where I'm still hoping to see support for pluggable
metadata, since installations often have their own metadata
requirements beyond the rudimentary stuff required by the
wiki itself, such as needing to integrate into existing
enterprise architectures.

The way I've currently written it, a plugin has full access to the entire JCR metadata - we just provide accessors for some most commonly used metadata (like the content of the page as a String; author of the page; version of the page).

Yes, that's what I'm looking forward to...

I would also take a look at Jackrabbit. It has some known scalability limits (if you put too many children in a single Node), but at least it's well tested (which Priha isn't right now).

The current design could take a couple of approaches. One is a structure
like:

   root node
   |
   _____ session node
         |
         __ document node
         __ document node
         __ document node
         ...

where a large number of nodes are loaded via a session and are meant
to be considered as children (so that their session metadata can be
found by going up to the session node rather than being stored redundantly
in each node.

This would suggest that a large number of document nodes (e.g., 500,000),
might be a problem, i.e., if we didn't store them flat. Another possibility
is to

   root node
   |
   __ session node
   __ document node
   __ document node
   __ document node
   ...

and then have links from each document node into its corresponding
session node. While this does avoid redundantly storing session-level
metadata at the node level, it (a) means we must store a session link
for each document, and (b) introduces potential synchonisation issues.

One of the things about JackRabbit is that currently it is a bear to
install. Simplicity is not part of their design goals, apparently.

Thanks,

Murray

...........................................................................
Murray Altheim <murray08 at altheim dot com>                       ===  = =
http://www.altheim.com/murray/                                     = =  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk               = =  = =

      Boundless wind and moon - the eye within eyes,
      Inexhaustible heaven and earth - the light beyond light,
      The willow dark, the flower bright - ten thousand houses,
      Knock at any door - there's one who will respond.
                                      -- The Blue Cliff Record

Reply via email to