[ 
https://issues.apache.org/jira/browse/OAK-12116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joerg Hoh updated OAK-12116:
----------------------------
    Description: 
I came across a situation, where session.save() was very slow and blocking 
other requests. The threaddump looked like this:

{noformat}
        at 
org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.loadSynonyms(FSTSynonymFilterFactory.java:142)
        at 
org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.inform(FSTSynonymFilterFactory.java:112)
        at 
org.apache.lucene.analysis.synonym.SynonymFilterFactory.inform(SynonymFilterFactory.java:90)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.init(NodeStateAnalyzerFactory.java:190)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.loadTokenFilterFactories(NodeStateAnalyzerFactory.java:118)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.composeAnalyzer(NodeStateAnalyzerFactory.java:104)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.createInstance(NodeStateAnalyzerFactory.java:98)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.collectAnalyzers(LuceneIndexDefinition.java:164)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.<init>(LuceneIndexDefinition.java:75)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.<init>(LuceneIndexDefinition.java:67)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.getIndexDefinition(IndexTracker.java:302)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorProvider.getIndexEditor(LuceneIndexEditorProvider.java:204)
        at 
org.apache.jackrabbit.oak.plugins.index.CompositeIndexEditorProvider.getIndexEditor(CompositeIndexEditorProvider.java:76)
        at 
org.apache.jackrabbit.oak.plugins.index.WhiteboardIndexEditorProvider.getIndexEditor(WhiteboardIndexEditorProvider.java:49)
        at 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate.collectIndexEditors(IndexUpdate.java:316)
        at 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate.enter(IndexUpdate.java:171)
        at 
org.apache.jackrabbit.oak.spi.commit.VisibleEditor.enter(VisibleEditor.java:53)
        at 
org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:48)
        at 
org.apache.jackrabbit.oak.spi.commit.EditorHook.processCommit(EditorHook.java:54)
        at 
org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
        at 
org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
        at 
org.apache.jackrabbit.oak.composite.CommitHookEnhancer.processCommit(CommitHookEnhancer.java:55)
        at 
org.apache.jackrabbit.oak.segment.scheduler.Commit.apply(Commit.java:105)
        at 
org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.execute(LockBasedScheduler.java:299)
        at 
org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.schedule(LockBasedScheduler.java:270)
        at 
org.apache.jackrabbit.oak.segment.SegmentNodeStore.merge(SegmentNodeStore.java:212)
        at 
org.apache.jackrabbit.oak.composite.CompositeNodeStore.merge(CompositeNodeStore.java:149)
        at 
org.apache.jackrabbit.oak.core.MutableRoot.commit(MutableRoot.java:261)
        at 
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.commit(SessionDelegate.java:402)
        at 
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:557)
        at 
org.apache.jackrabbit.oak.jcr.session.SessionImpl$9.performVoid(SessionImpl.java:460)
        at 
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.performVoid(SessionDelegate.java:306)
        at 
org.apache.jackrabbit.oak.jcr.session.SessionImpl.save(SessionImpl.java:457)
{noformat}

This was a consistent pattern visible across multiple threaddumps taken across 
a larger period of time (30+ minutes).

We have identified that one Lucene index (marked with async,nrt) was configured 
with a synonym.txt with 600kb size. That means, parsing that indeed takes time.

The expected behavior is that this file is parsed once in the context of the 
index creation and then cached in the IndexTracker; but given that we have seen 
this behavior across multiple requests over a larger period of time, this 
obviously did not happen, and the index object was constructed on every commit.

This is an unexpected behaviour I have never seen before (even with stopwords 
and/or synonym lists of similar and or even higher size), so there must be a 
reason, why this index is not cached at the IndexTracker.



  was:
I came across a situation, where session.save() was very slow and blocking 
other requests. The threaddump looked like this:

{noformat}
        at 
org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.loadSynonyms(FSTSynonymFilterFactory.java:142)
        at 
org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.inform(FSTSynonymFilterFactory.java:112)
        at 
org.apache.lucene.analysis.synonym.SynonymFilterFactory.inform(SynonymFilterFactory.java:90)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.init(NodeStateAnalyzerFactory.java:190)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.loadTokenFilterFactories(NodeStateAnalyzerFactory.java:118)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.composeAnalyzer(NodeStateAnalyzerFactory.java:104)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.createInstance(NodeStateAnalyzerFactory.java:98)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.collectAnalyzers(LuceneIndexDefinition.java:164)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.<init>(LuceneIndexDefinition.java:75)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.<init>(LuceneIndexDefinition.java:67)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.getIndexDefinition(IndexTracker.java:302)
        at 
org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorProvider.getIndexEditor(LuceneIndexEditorProvider.java:204)
        at 
org.apache.jackrabbit.oak.plugins.index.CompositeIndexEditorProvider.getIndexEditor(CompositeIndexEditorProvider.java:76)
        at 
org.apache.jackrabbit.oak.plugins.index.WhiteboardIndexEditorProvider.getIndexEditor(WhiteboardIndexEditorProvider.java:49)
        at 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate.collectIndexEditors(IndexUpdate.java:316)
        at 
org.apache.jackrabbit.oak.plugins.index.IndexUpdate.enter(IndexUpdate.java:171)
        at 
org.apache.jackrabbit.oak.spi.commit.VisibleEditor.enter(VisibleEditor.java:53)
        at 
org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:48)
        at 
org.apache.jackrabbit.oak.spi.commit.EditorHook.processCommit(EditorHook.java:54)
        at 
org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
        at 
org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
        at 
org.apache.jackrabbit.oak.composite.CommitHookEnhancer.processCommit(CommitHookEnhancer.java:55)
        at 
org.apache.jackrabbit.oak.segment.scheduler.Commit.apply(Commit.java:105)
        at 
org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.execute(LockBasedScheduler.java:299)
        at 
org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.schedule(LockBasedScheduler.java:270)
        at 
org.apache.jackrabbit.oak.segment.SegmentNodeStore.merge(SegmentNodeStore.java:212)
        at 
org.apache.jackrabbit.oak.composite.CompositeNodeStore.merge(CompositeNodeStore.java:149)
        at 
org.apache.jackrabbit.oak.core.MutableRoot.commit(MutableRoot.java:261)
        at 
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.commit(SessionDelegate.java:402)
        at 
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:557)
        at 
org.apache.jackrabbit.oak.jcr.session.SessionImpl$9.performVoid(SessionImpl.java:460)
        at 
org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.performVoid(SessionDelegate.java:306)
        at 
org.apache.jackrabbit.oak.jcr.session.SessionImpl.save(SessionImpl.java:457)
{noformat}

This was a consistent pattern visible across multiple threaddumps taken across 
a larger period of time (30+ minutes).

We have identified that one Lucene index (marked with async,nrt) was configured 
with a synonym.txt with 600kb size. That means, parsing that indeed takes time.

The expected behavior is that this file is parsed once in the context of the 
index creation and then cached in the IndexTracker; but given that we have seen 
this behavior across multiple requests over a larger period of time, this 
obviously did not happen, and the index object was constructed on every commit.





> Large synonym list delays commits to the repository
> ---------------------------------------------------
>
>                 Key: OAK-12116
>                 URL: https://issues.apache.org/jira/browse/OAK-12116
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: indexing
>    Affects Versions: 1.88.0
>            Reporter: Joerg Hoh
>            Priority: Major
>
> I came across a situation, where session.save() was very slow and blocking 
> other requests. The threaddump looked like this:
> {noformat}
>         at 
> org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.loadSynonyms(FSTSynonymFilterFactory.java:142)
>         at 
> org.apache.lucene.analysis.synonym.FSTSynonymFilterFactory.inform(FSTSynonymFilterFactory.java:112)
>         at 
> org.apache.lucene.analysis.synonym.SynonymFilterFactory.inform(SynonymFilterFactory.java:90)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.init(NodeStateAnalyzerFactory.java:190)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.loadTokenFilterFactories(NodeStateAnalyzerFactory.java:118)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.composeAnalyzer(NodeStateAnalyzerFactory.java:104)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.NodeStateAnalyzerFactory.createInstance(NodeStateAnalyzerFactory.java:98)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.collectAnalyzers(LuceneIndexDefinition.java:164)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.<init>(LuceneIndexDefinition.java:75)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexDefinition.<init>(LuceneIndexDefinition.java:67)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker.getIndexDefinition(IndexTracker.java:302)
>         at 
> org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditorProvider.getIndexEditor(LuceneIndexEditorProvider.java:204)
>         at 
> org.apache.jackrabbit.oak.plugins.index.CompositeIndexEditorProvider.getIndexEditor(CompositeIndexEditorProvider.java:76)
>         at 
> org.apache.jackrabbit.oak.plugins.index.WhiteboardIndexEditorProvider.getIndexEditor(WhiteboardIndexEditorProvider.java:49)
>         at 
> org.apache.jackrabbit.oak.plugins.index.IndexUpdate.collectIndexEditors(IndexUpdate.java:316)
>         at 
> org.apache.jackrabbit.oak.plugins.index.IndexUpdate.enter(IndexUpdate.java:171)
>         at 
> org.apache.jackrabbit.oak.spi.commit.VisibleEditor.enter(VisibleEditor.java:53)
>         at 
> org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:48)
>         at 
> org.apache.jackrabbit.oak.spi.commit.EditorHook.processCommit(EditorHook.java:54)
>         at 
> org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
>         at 
> org.apache.jackrabbit.oak.spi.commit.CompositeHook.processCommit(CompositeHook.java:60)
>         at 
> org.apache.jackrabbit.oak.composite.CommitHookEnhancer.processCommit(CommitHookEnhancer.java:55)
>         at 
> org.apache.jackrabbit.oak.segment.scheduler.Commit.apply(Commit.java:105)
>         at 
> org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.execute(LockBasedScheduler.java:299)
>         at 
> org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.schedule(LockBasedScheduler.java:270)
>         at 
> org.apache.jackrabbit.oak.segment.SegmentNodeStore.merge(SegmentNodeStore.java:212)
>         at 
> org.apache.jackrabbit.oak.composite.CompositeNodeStore.merge(CompositeNodeStore.java:149)
>         at 
> org.apache.jackrabbit.oak.core.MutableRoot.commit(MutableRoot.java:261)
>         at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.commit(SessionDelegate.java:402)
>         at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:557)
>         at 
> org.apache.jackrabbit.oak.jcr.session.SessionImpl$9.performVoid(SessionImpl.java:460)
>         at 
> org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.performVoid(SessionDelegate.java:306)
>         at 
> org.apache.jackrabbit.oak.jcr.session.SessionImpl.save(SessionImpl.java:457)
> {noformat}
> This was a consistent pattern visible across multiple threaddumps taken 
> across a larger period of time (30+ minutes).
> We have identified that one Lucene index (marked with async,nrt) was 
> configured with a synonym.txt with 600kb size. That means, parsing that 
> indeed takes time.
> The expected behavior is that this file is parsed once in the context of the 
> index creation and then cached in the IndexTracker; but given that we have 
> seen this behavior across multiple requests over a larger period of time, 
> this obviously did not happen, and the index object was constructed on every 
> commit.
> This is an unexpected behaviour I have never seen before (even with stopwords 
> and/or synonym lists of similar and or even higher size), so there must be a 
> reason, why this index is not cached at the IndexTracker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to