[ https://issues.apache.org/jira/browse/OAK-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535812#comment-16535812 ]
Vikas Saurabh commented on OAK-7495: ------------------------------------ I think I know what's going on (at least in one case). The issue seems to be originating due to the fact that sometimes between "observer adding a document to queue" and "observer call stack updating and refreshing index view" the queue processor gets called and writes to document to index (and also marks it as processed .. thus observer doesn't process the new doc anymore). But queue processor then gets unscheduled (before marking as "index requires refresh") and consumer hits the query without the doc visible in reader yet. I've added a few attachments: * [^OAK-7495-add-logs.patch] - add a lot of logs which show how the calls happen * [^OAK-7495-test.patch] - this is almost same as test in [^OAK-7495.demo.patch] with deadlock fixed, removed sleeps and additional reference editor provider (to avoid continuous logs for reference index) * [^OAK-7495-potential-fix.patch] - a trivial fix which is simply forcing sync docs not pushed to queue but instead always to be updated in sync with save call stack (I think it can have impact on save performance... but, as a counter-point, I think "sync" index should do that anyway) All this said, I don't feel comfortable with trying to maintain quite a complex sync implementation where hybridV2 does a repository level sync view for specific properties (so force sync for only some properties and not all properties being handled by the index). I'd rather prefer we deprecate "async='sync' " type (of course, that's my personal view) [~chetanm] would love to hear your thought on this. [~egli], while I'm still not completely sure how we might want to handle this issue... but, maybe, you can try [^OAK-7495-potential-fix.patch] to see if expectation of your use case works better. > async,sync index not synchronous > -------------------------------- > > Key: OAK-7495 > URL: https://issues.apache.org/jira/browse/OAK-7495 > Project: Jackrabbit Oak > Issue Type: Bug > Components: indexing > Affects Versions: 1.6.1 > Reporter: Stefan Egli > Assignee: Vikas Saurabh > Priority: Major > Attachments: GetJobVerifier.java, OAK-7495-add-logs.patch, > OAK-7495-potential-fix.patch, OAK-7495-test.patch, OAK-7495.demo.patch, > slingeventJob.-1.tidy.json, unit-tests.log > > > On an oak 1.6.1 (AEM 6.3) a suspicious behaviour was detected, where in Sling > an > [addJob|https://github.com/apache/sling-old-svn-mirror/blob/org.apache.sling.event-4.2.0/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L286] > followed by a > [getJobById|https://github.com/apache/sling-old-svn-mirror/blob/org.apache.sling.event-4.2.0/src/main/java/org/apache/sling/event/impl/jobs/JobManagerImpl.java#L294] > (in a different thread though, but perhaps would also fail in same thread) > was not seeing the job that was just created. > To give a bit more background, in Sling getJobById results in a query. That > query uses an index which is built using {{"async, sync"}}. So the assumption > is that the index is actually synchronous. But a test reproducing initially > mentioned scenario showed the opposite. > Attached: > * [^GetJobVerifier.java] a Sling job test case that has 2 threads: a thread > that does addJob, adds the resulting jobId to a list (synchronized). and a > second thread that reads the jobId off that list and does a getJobById. That > getJobById should find the job, as it was just created (how else could you > figure out the jobId) - but sometimes it FAILs (see system out FAIL) > * [^slingeventJob.-1.tidy.json] the index definition showing it is indeed > "async, sync" > PS: Example query that is executed: > {{/jcr:root/var/eventing/jobs//element(*,slingevent:Job)[@slingevent:eventId > = '2018/5/11/2/12/bca505d9-3044-4de9-9732-056ab1b6c513_5569']}} > /cc [~catholicon] -- This message was sent by Atlassian JIRA (v7.6.3#76005)