[ https://issues.apache.org/jira/browse/LUCENE-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532371#comment-16532371 ]
Ruslan Torobaev commented on LUCENE-8380: ----------------------------------------- Thanks Dawid!:) > UTF8TaxonomyWriterCache inconsistency > ------------------------------------- > > Key: LUCENE-8380 > URL: https://issues.apache.org/jira/browse/LUCENE-8380 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet > Affects Versions: 7.1 > Reporter: Ruslan Torobaev > Assignee: Dawid Weiss > Priority: Minor > Fix For: 7.5 > > Attachments: LUCENE-8380.patch, lucene-taxonomy-cache-report.tar.gz, > taxonomy-cache.json.gz, taxonomy.tar.gz > > > I’m facing a problem with taxonomy writer cache inconsistency. At some point > in time UTF8TaxonomyWriterCache starts to return wrong ord for some facet > labels. As result wrong ord are written in doc facet fields, and wrong counts > are returned (undercount) during search. This bug is manifested on different > servers with different index contents (we have several separate indexes with > unique data). > Unfortunately I can’t reproduce this behaviour in tests. > I've dumped "broken" UTF8TaxonomyWriterCache instance and created app to > load it and to compare with real taxonomy. Dumps and app are in attachment. > To run demo extract archives content and exec: > {code} > mvn compile > mvn exec:java > -Dexec.mainClass="me.torobaev.lucene.taxonomy.cache.TaxonomyCacheCheck" > -DtaxonomyDir=../taxonomy/ -DcacheDump=../taxonomy-cache.json > {code} > As you can see, labels [frametype, 7] and [modification_id, 682] have same > ord in cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org