[ https://issues.apache.org/jira/browse/OAK-8046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771882#comment-16771882 ]
Vikas Saurabh commented on OAK-8046: ------------------------------------ Thanks for the review [~tmueller]. bq. LOG.debug was changed to LOG.info, was that intentional? LOG.info("Change in index version detected..." Yes, I intentionally did that for 2 reasons: * seeing logs with wrapping counter under same query would've been confusing otherwise (I think)... such line would likely make it clearer why the counter wrapped around * I don't expect such cases to be too common so this likely won't cause lot of logging traffic and would be useful in cases where it shows up bq. hasRewound() needs to be called at the right moment, always after next() Umm... I guess it needs to be called between 2 next() calls. After or before would probably not matter, right? bq. I think there is a risk that hasRewound() isn't always called at the right moment I don't think I understand. Afaict, as long as FulltextPathCursor#next is the right place to log index traversal log then checking for hasRewound() in next() should be ok too, right? bq. A future change might be: add a skip method (to speed up queries with offset). I'm not saying we need to add this. Hmm... I've multiple thoughts on this - current case is ok wrt to local lucene I think. Otoh, remote index can't have such a concept of rewound - we really *cant* give the guarantees in remote index that we give in local lucene. Next thought is where do we mean to give "skip" method - from lucene pov each new batch of result is already given due to skipping. From query pov, skip only makes sense for the first batch of resutls. bq. Maybe it would be simpler if the LuceneResultRowIterator would maintain the read count itself, and so instead of boolean hasRewound add a method getReadCount? So rename IteratorRewoundStateProvider to IteratorReadSizeProvider I thought about it - but then each provider (hopefully ES somewhere in the future) would also need to do the same thing OR FulltextPathCursor would've to do check if underlying cursor provides size or not and support its own counter accordingly. In the end, I felt that simple flag to reset counter should be ok. That said, I'm ok to change it if you feel rather strongly about it (I don't have strong opinion one way or other... I just preferred current one slightly). Btw, do note, resetted counter is technically incorrect in context of "nrt"/"sync" results. Without index re-open, the counter would first account for results from "nrt"/"sync" and then lucene index based results would be counted. When the index gets reopened, the first counted result is from lucene index. So, "10000" after re-open would technically be "10000+x" (x results were given due to "nrt"/"sync"). I initially thought about accounting for this - but I don't expect "nrt"/"sync" results to be huge in context where these numbers become relevant - so, the error in accountin, imo was tolerable (when compared to implementation complication that accurate accounting would've brought in). > Result items are not always correctly counted against the configured read > limit if a query uses a lucene index > --------------------------------------------------------------------------------------------------------------- > > Key: OAK-8046 > URL: https://issues.apache.org/jira/browse/OAK-8046 > Project: Jackrabbit Oak > Issue Type: Bug > Components: lucene > Affects Versions: 1.8.7 > Reporter: Georg Henzler > Assignee: Vikas Saurabh > Priority: Major > Attachments: OAK-8046.patch > > > There are cases where an index is re-opened during query execution. In that > case, already returned entries are read again and skipped, so basically > counted twice. This should be fixed to only count entries once (see also [1]) > The issue most likely exists since the read limit was introduced with OAK-6875 > [1] > https://lists.apache.org/thread.html/dddf9834fee0bccb6e48f61ba2a01430e34fc0b464b12809f7dfe2eb@%3Coak-dev.jackrabbit.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)