[ https://issues.apache.org/jira/browse/OAK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15506572#comment-15506572 ]
Stefan Egli edited comment on OAK-4796 at 9/20/16 1:42 PM: ----------------------------------------------------------- [~chetanm], I see, your approach is completely different. Main differences I see: # ObserverValidatorProvider approach: #* 100% filtering of local and external events (whereas the external part is not yet implemented, actually, but would be similar) #* for external events the diffing is done as today, so no performance improvements there. But we can also filter entire external events for the individual listeners as for local ones, just at a different location (in the backgroundRead somewhere) # Extracted-Data approach: #* not 100% filtering, but perhaps close #* makes diffing for other instances in the cluster cheaper so... let's decide which one to go for. was (Author: egli): [~chetanm], I see, your approach is completely different. Main differences I see: # ObserverValidatorProvider approach: * 100% filtering of local and external events (whereas the external part is not yet implemented, actually, but would be similar) * for external events the diffing is done as today, so no performance improvements there. But we can also filter entire external events for the individual listeners as for local ones, just at a different location (in the backgroundRead somewhere) # Extracted-Data approach: * not 100% filtering, but perhaps close * makes diffing for other instances in the cluster cheaper so... let's decide which one to go for. > filter events before adding to ChangeProcessor's queue > ------------------------------------------------------ > > Key: OAK-4796 > URL: https://issues.apache.org/jira/browse/OAK-4796 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: jcr > Affects Versions: 1.5.9 > Reporter: Stefan Egli > Assignee: Stefan Egli > Labels: observation > Fix For: 1.6 > > Attachments: OAK-4796.patch > > > Currently the > [ChangeProcessor.contentChanged|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L335] > is in charge of doing the event diffing and filtering and does so in a > pooled Thread, ie asynchronously, at a later stage independent from the > commit. This has the advantage that the commit is fast, but has the following > potentially negative effects: > # events (in the form of ContentChange Objects) occupy a slot of the queue > even if the listener is not interested in it - any commit lands on any > listener's queue. This reduces the capacity of the queue for 'actual' events > to be delivered. It therefore increases the risk that the queue fills - and > when full has various consequences such as loosing the CommitInfo etc. > # each event==ContentChange later on must be evaluated, and for that a diff > must be calculated. Depending on runtime behavior that diff might be > expensive if no longer in the cache (documentMk specifically). > As an improvement, this diffing+filtering could be done at an earlier stage > already, nearer to the commit, and in case the filter would ignore the event, > it would not have to be put into the queue at all, thus avoiding occupying a > slot and later potentially slower diffing. > The suggestion is to implement this via the following algorithm: > * During the commit, in a {{Validator}} the listener's filters are evaluated > - in an as-efficient-as-possible manner (Reason for doing it in a Validator > is that this doesn't add overhead as oak already goes through all changes for > other Validators). As a result a _list of potentially affected observers_ is > added to the {{CommitInfo}} (false positives are fine). > ** Note that the above adds cost to the commit and must therefore be > carefully done and measured > ** One potential measure could be to only do filtering when listener's queues > are larger than a certain threshold (eg 10) > * The ChangeProcessor in {{contentChanged}} (in the one created in > [createObserver|https://github.com/apache/jackrabbit-oak/blob/f4f4e01dd8f708801883260481d37fdcd5868deb/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/observation/ChangeProcessor.java#L224]) > then checks the new commitInfo's _potentially affected observers_ list and > if it's not in the list, adds a {{NOOP}} token at the end of the queue. If > there's already a NOOP there, the two are collapsed (this way when a filter > is not affected it would have a NOOP at the end of the queue). If later on a > no-NOOP item is added, the NOOP's {{root}} is used as the {{previousRoot}} > for the newly added {{ContentChange}} obj. > ** To achieve that, the ContentChange obj is extended to not only have the > "to" {{root}} pointer, but also the "from" {{previousRoot}} pointer which > currently is implicitly maintained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)