[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms

2018-05-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460999#comment-16460999
 ] 

ASF subversion and git services commented on LUCENE-8279:
-

Commit e00c4cede26690a82cf553a22b53a47c675cc01d in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e00c4ce ]

LUCENE-8279: CheckIndex now cross-checks terms with norms.


> Improve CheckIndex on norms
> ---
>
> Key: LUCENE-8279
> URL: https://issues.apache.org/jira/browse/LUCENE-8279
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8279.patch, LUCENE-8279.patch
>
>
> We should improve CheckIndex to make sure that terms and norms agree on which 
> documents have a value on an indexed field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms

2018-04-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456261#comment-16456261
 ] 

Robert Muir commented on LUCENE-8279:
-

+1

> Improve CheckIndex on norms
> ---
>
> Key: LUCENE-8279
> URL: https://issues.apache.org/jira/browse/LUCENE-8279
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8279.patch, LUCENE-8279.patch
>
>
> We should improve CheckIndex to make sure that terms and norms agree on which 
> documents have a value on an indexed field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms

2018-04-27 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456253#comment-16456253
 ] 

Adrien Grand commented on LUCENE-8279:
--

Here is an updated patch with Robert's suggestion.

> Improve CheckIndex on norms
> ---
>
> Key: LUCENE-8279
> URL: https://issues.apache.org/jira/browse/LUCENE-8279
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8279.patch, LUCENE-8279.patch
>
>
> We should improve CheckIndex to make sure that terms and norms agree on which 
> documents have a value on an indexed field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms

2018-04-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454244#comment-16454244
 ] 

Robert Muir commented on LUCENE-8279:
-

or even better maybe just move this check into the postings check so that it 
happens for each field without creating problematic memory usage. postings 
check already cross checks some stuff with fieldinfos...

> Improve CheckIndex on norms
> ---
>
> Key: LUCENE-8279
> URL: https://issues.apache.org/jira/browse/LUCENE-8279
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8279.patch
>
>
> We should improve CheckIndex to make sure that terms and norms agree on which 
> documents have a value on an indexed field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms

2018-04-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454241#comment-16454241
 ] 

Robert Muir commented on LUCENE-8279:
-

I am thinking of this one: 
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L1328

Maybe it could be moved to the TermIndexStatus or whatever so that the norms 
check could be moved to after the postings check and re-use it.

> Improve CheckIndex on norms
> ---
>
> Key: LUCENE-8279
> URL: https://issues.apache.org/jira/browse/LUCENE-8279
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8279.patch
>
>
> We should improve CheckIndex to make sure that terms and norms agree on which 
> documents have a value on an indexed field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms

2018-04-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454222#comment-16454222
 ] 

Robert Muir commented on LUCENE-8279:
-

the check is implemented as a "slow" check, but don't we already construct the 
same bitset already to verify some postings list statistics such as docCount ?

> Improve CheckIndex on norms
> ---
>
> Key: LUCENE-8279
> URL: https://issues.apache.org/jira/browse/LUCENE-8279
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8279.patch
>
>
> We should improve CheckIndex to make sure that terms and norms agree on which 
> documents have a value on an indexed field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8279) Improve CheckIndex on norms

2018-04-26 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453981#comment-16453981
 ] 

Adrien Grand commented on LUCENE-8279:
--

Here is a patch. There is one case when terms and norms may go out-of-sync: 
when a document fails indexing eg. because the consumption of the token stream 
triggers an exception. In such a case you could end up with terms for this 
document but no norm. Since IndexWriter immediately marks the document as 
deleted in such a case, the new check only verifies that terms and norms agree 
on live documents.

> Improve CheckIndex on norms
> ---
>
> Key: LUCENE-8279
> URL: https://issues.apache.org/jira/browse/LUCENE-8279
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8279.patch
>
>
> We should improve CheckIndex to make sure that terms and norms agree on which 
> documents have a value on an indexed field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org