What are the consistency assumptions a JCR client should be allowed to
make?

An approach where temporary inconsistencies are tolerated (i.e. eventual
consistency) increases availability and throughput. In such a case
do/can/should we tolerate temporary violations of:

- Node type constraints?

so far we seem to have only discussed edge cases where node type
constraints could be violated. I think, they are not too relevant in
a real life system. I'd be OK to make some compromises in this area.

With the current Microkernel whether these cases (i.e. write skew) [1]
are edge case or not depends on the degree of write concurrency we
anticipate. If we fully synchronize all writes, these cases wont occur
at all. If OTOH we aim for highly concurrent writes, we will see such
cases possibly more often than we like.

I think most applications that have highly concurrent writes usually
distribute the writes across many nodes. e.g. you have lots of users
working with the system, but each of them is working with his/her
own dataset.

This is correct as long as we exclude collaborative workspace use cases where users typically work on the same document concurrently.

[...]

To me the example on the wiki page is a reason to drop support
for setPrimaryType() for jr3. The specification says:

Agreed. Note however, that the same problem also occurs for mixins.

[...]

Do we have other examples where we know consistency from a
JCR perspective is at risk?

Referential integrity for mix:referenceable nodes might break in the same way.

The problem occurs anywhere where parts of the data in a save depend in some way on other parts of that save. For example when two properties of a node need to obey a certain condition. This might make it also hard to implement things like versioning since the implementation must then encode dependent JCR properties into the same JSON value of the underlying Microkernel in order circumvent this problem.

As discussed in an earlier thread, the problem is easily fixed for direct clients of the Microkernel API if we add some testAndSet functionality to the Microkernel.

Michael




[1]
http://wiki.apache.org/jackrabbit/Transactional%20model%20of%20the%20
Microkernel%20based%20Jackrabbit%20prototype

- Access control rights?

I don't think any violations are acceptable here.

Me neither. But again we need to be aware of the write skew issue here:
an ACL implementation must be very careful about its consistency
assumptions or it will eventually fail.

- Lock enforcement?

that's definitively a tough one because it depends on repository
wide state.

This is an area where Apache Zookeeper might help out.

- Query index consistency?

I think consistency is a prerequisite here, otherwise it's quite
difficult to implement the query functionality. I'd rather
make compromises for availability. eg. terminate a long query
execution with an exception because the snapshot it was
working on is not available anymore.

I was more thinking of the other direction: would it be tolerable to
have the query index not up to date yet? (i.e. after a possibly large
save.) Again, this could either result in incomplete query results, an
exception or the query to be deferred until the index is up to date.
Maybe we could even let the client chose through 'query hints'.

I like the query hint idea.

alternatively we could also deny access to the most recent revision
until the index is updated (possibly asynchronously). this way
reads and writes are fast at the cost of consistency. reads would
be eventually consistent (once index is updated).

regards
  marcel

Michael


- Atomicity of save operations?

how does a temporary violation of atomic saves look like?
are you thinking of partially visible changes?

regards
   marcel

- ...?

Should we offer alternatives in some of these cases? That is, give the
client the ability to choose between consistency and availability.

Michael


[1]

http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20J
ackrabbit%203

Reply via email to