Observation

Alexander Klimetschek Wed, 06 Nov 2013 17:50:38 -0800

Hi everyone,

having looked at and discussed observation recently (partly on our Adobe 
internal list with some of our application use cases, partly with Toby f2f), 
here are some things that are IMHO worthwile to look at. I hope I can 
contribute a bit, but before I get pulled away again, I wanted at least 
summarize and persist() my ideas:


1) listeners should not get external events by default

OAK-1121 helps, but applications must say they exclude external events, while 
it should be the other way around (only local by default) - this can arguably 
not change because of the jcr observation contract. The current implementation 
can still be optimized a bit, see my comment in the issue.

2) filtering happens too late within listeners because jcr observation API is 
too simple (OAK-1133, "observation listener plus")

The filter logic would be passed declaratively and allow more options as noted 
in OAK-1133:

        addEventListener(listener, Filter.Path("/libs"), 
Filter.PropertyValue("jcr:content/sling:resourceType", "myValue"), ...)

(just a rough sketch; the simplest solution could simply introduce a 
FilterEventListener interface that extends the existing EventListener and 
provides the filter in a getFilter() method; a simple instanceof check within 
oak-jcr could detect that; but a completely separate API might also make sense, 
see point 6 below)

This could then be evaluated when the NodeStateDiff is calculated. What is good 
now already is that the base path (if specified in the jcr observation) is 
evaluated as soon as getting the diff roots. But many real use cases might want 
to have multiple paths (/apps OR /libs) or globbing 
(/content/site/*/jcr:content) - this could be easily and efficiently added here 
in one place (amongst those other filters). Having one or few base paths is in 
the oak case probably the best way to filter things and one should avoid 
listeners that register at root, and only filter on node types for example, 
which requires a lot more tree walking.

In our app (CQ) we have 18 listeners that register for all events, because the 
JCR observation doesn't give them the right filter options, so they have to do 
it themselves. I expect them to be vastly improved with this new approach.

Also, getting events for deleted nodes could be supported here, since there is 
full access to the old NodeState.

The ideal solution could also be built backwards compatible: implement the same 
filtering with normal JCR API usage and automatically wrap the listener in 
another event listener that filters the "manual" way. This allows application 
code that works with both Jackrabbit 2 or Oak with minimal effort (would need 
some helper in jackrabbit-jcr-commons or so). Unless we want to backport this 
to Jackrabbit 2 as well.

3) cluster events

As agreed, external events can be the real scalability issue. It was also 
noted, that many cases do not need it at all (hence 1). But there are still 
cases that need it: in Sling for example, code deployment (bundles, JSP 
scripts, etc.) is based on the JCR repository, meaning when you put a bundle in 
a certain "install" folder in the JCR, all cluster instances need to pick it 
up. However, such deployment events are comparably  rare (except on a developer 
machine :)), so the throughput doesn't have to be high. OTOH large-scale 
content changes will (and should) be handled locally, thus not requiring 
external propagation.

Now the question is if 2) with a filter-as-detailed-and-early-as-possible 
approach solves this already (having the right external/non-external flag on 
all listeners, plus detailed filters for the external listeners), or if we 
could make use of the fairly rare nature of those events and try to filter on 
the _source_ instance already and propagate those few events in some other, 
efficient way to other cluster instances. This might improve performance as the 
ChangeDispatcher#externalChange() check (polling all 100ms) could then be 
removed completely. But if the filtering on the NodeStateDiff is fast (because 
listeners are specific), then maybe it is ok.

4) listener threads

Currently in oak each event listener gets its own thread. In our app CQ we have 
about 150 listeners, so you end up introducing 150 threads. I am not sure if 
this is a good idea.

Jackrabbit 2 had a single thread, which had the issue of one observation 
listener blocking all others.

An obvious solution would be a thread pool (probably configurable number of 
threads). And on top of that, if the pool is full and no thread is free 
anymore, one could simply kill blocked handlers after a certain timeout (and 
optionally blacklist them).

5) filtering in central thread?

There was some discussion about the current filtering evaluation [0]. Each 
listener has its own ChangeProcessor which in turn has its own 
ChangeDispatcher.Listener. This means filtering is not in a single, central 
thread, but happens in each listener's thread (they each have their own 
ChangeSet queue, which indeed might be different because the previous root 
NodeState to compare to might be different for each).

I thought this was not optimal but it seems I was wrong - splitting that up and 
generating events in a central thread and then passing them in a queue to the 
listeners actually turned out to scale worse (in a quick test with many 
parallel listeners). See my patch at [1].

I guess it is that reading asynchronously from the immutable NodeStates is more 
efficient than multiple blocking queues. Which speaks good for the underlying 
oak implementation :)

[0] http://markmail.org/thread/533orsfr44wllvrx
[1] 
https://github.com/alexkli/jackrabbit-oak/commit/aee631ded4996194a3ad0dec1fc7a9917f7123b8

6) long-running session just for observation

One problem with JCR observation is that you need one session open all the time 
"just for the observation". This is done to be able to run observation under 
the permissions of a certain user. But nobody in practice uses the session for 
anything else than reading data upon events (which mostly is done to filter 
only, see 2); when you need to write things in an event, best practice is to 
create a new session on demand an close it again. With Oak's refresh() policy 
this is even more important.

Maybe a new API from 2) could skip the requirement of an open session. 
Registration would be based upon a session to handle the credentials easily, 
but that session could be closed without killing the listener. You would get 
back a special listener object that would need to be kept alive and referenced 
to be able to clean up listeners automatically on a finalize() if they forget 
to unregister.

Or you keep the session, but mark it in a special way so that Oak can 
internally save resources. Or maybe the session in oak is already so 
light-weight that this is actually no problem at all. I am just mentioning this 
because of the many warnings I get from oak in the logs about sessions that are 
open very long (and don't call refresh() IIRC).

7) move code to oak-jcr (minor)

Quite a bit of the code in oak.plugins.observation is currently JCR eventing 
specific (EventGenerator for example), afaics this is better suited in oak-jcr. 
Although with 2) it might change a bit anyway.

Cheers,
Alex

Observation

Reply via email to