[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms
[ https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460999#comment-16460999 ] ASF subversion and git services commented on LUCENE-8279: - Commit e00c4cede26690a82cf553a22b53a47c675cc01d in lucene-solr's branch refs/heads/master from [~jpountz] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e00c4ce ] LUCENE-8279: CheckIndex now cross-checks terms with norms. > Improve CheckIndex on norms > --- > > Key: LUCENE-8279 > URL: https://issues.apache.org/jira/browse/LUCENE-8279 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8279.patch, LUCENE-8279.patch > > > We should improve CheckIndex to make sure that terms and norms agree on which > documents have a value on an indexed field. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms
[ https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456261#comment-16456261 ] Robert Muir commented on LUCENE-8279: - +1 > Improve CheckIndex on norms > --- > > Key: LUCENE-8279 > URL: https://issues.apache.org/jira/browse/LUCENE-8279 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8279.patch, LUCENE-8279.patch > > > We should improve CheckIndex to make sure that terms and norms agree on which > documents have a value on an indexed field. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms
[ https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456253#comment-16456253 ] Adrien Grand commented on LUCENE-8279: -- Here is an updated patch with Robert's suggestion. > Improve CheckIndex on norms > --- > > Key: LUCENE-8279 > URL: https://issues.apache.org/jira/browse/LUCENE-8279 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8279.patch, LUCENE-8279.patch > > > We should improve CheckIndex to make sure that terms and norms agree on which > documents have a value on an indexed field. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms
[ https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454244#comment-16454244 ] Robert Muir commented on LUCENE-8279: - or even better maybe just move this check into the postings check so that it happens for each field without creating problematic memory usage. postings check already cross checks some stuff with fieldinfos... > Improve CheckIndex on norms > --- > > Key: LUCENE-8279 > URL: https://issues.apache.org/jira/browse/LUCENE-8279 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8279.patch > > > We should improve CheckIndex to make sure that terms and norms agree on which > documents have a value on an indexed field. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms
[ https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454241#comment-16454241 ] Robert Muir commented on LUCENE-8279: - I am thinking of this one: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L1328 Maybe it could be moved to the TermIndexStatus or whatever so that the norms check could be moved to after the postings check and re-use it. > Improve CheckIndex on norms > --- > > Key: LUCENE-8279 > URL: https://issues.apache.org/jira/browse/LUCENE-8279 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8279.patch > > > We should improve CheckIndex to make sure that terms and norms agree on which > documents have a value on an indexed field. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms
[ https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454222#comment-16454222 ] Robert Muir commented on LUCENE-8279: - the check is implemented as a "slow" check, but don't we already construct the same bitset already to verify some postings list statistics such as docCount ? > Improve CheckIndex on norms > --- > > Key: LUCENE-8279 > URL: https://issues.apache.org/jira/browse/LUCENE-8279 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8279.patch > > > We should improve CheckIndex to make sure that terms and norms agree on which > documents have a value on an indexed field. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms
[ https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453981#comment-16453981 ] Adrien Grand commented on LUCENE-8279: -- Here is a patch. There is one case when terms and norms may go out-of-sync: when a document fails indexing eg. because the consumption of the token stream triggers an exception. In such a case you could end up with terms for this document but no norm. Since IndexWriter immediately marks the document as deleted in such a case, the new check only verifies that terms and norms agree on live documents. > Improve CheckIndex on norms > --- > > Key: LUCENE-8279 > URL: https://issues.apache.org/jira/browse/LUCENE-8279 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-8279.patch > > > We should improve CheckIndex to make sure that terms and norms agree on which > documents have a value on an indexed field. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org