[ 
https://issues.apache.org/jira/browse/LUCENE-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464105
 ] 

Doron Cohen commented on LUCENE-741:
------------------------------------

I was looking at what it would take to make this work with .nrm file as well. 
I expected there will be a test that fails currently, but there is none.
So I looked into the tests and the implementation and have a few questions:

(1) under contrib, FieldNormModifier and LengthNormModifier seem quite similar, 
right? 
The first one sets with:
 - reader.setNorm(d, fieldName, 
 - sim.encodeNorm(sim.lengthNorm(fieldName, termCounts[d])));
The latter with:
 - byte norm = sim.encodeNorm(sim.lengthNorm(fieldName, termCounts[d]));
 - reader.setNorm(d, fieldName, norm);
Do we need to keep both?

(2) TestFieldNormModifier.testFieldWithNoNorm() calls resetNorms() for a field 
that does not exist. Some work is done by the modifier to collect the term 
frequencies, and then reader.setNorm is called but it does nothing, because 
there are no norms. And indeed the test verifies that there are still no norms 
for this field. Confusing I think. For some reason I assumed that calling 
resetNorms() for a field that has none, would implicitly set omitNorms to false 
for that field and compute it - the inverse of killNorms(). Since this is not 
the case, perhaps resetNorms should throw an exception in this case?

(3) I would feel safer about this feature if the test was more strict - 
something like TestNorms - have several fields, modify some, each in a unique 
way, remove some others, then at the end verify that all the values of each 
field norms are exactly as expected. 

(4) For killNorms to work, you can first revert the index to not use .nrm, and 
then "kill" as before. The code knows to read .fN files, for both backwards 
compatibility, and for reading segments created be DocumentWriter. The 
following steps will do this:
- read the norms using reader.norm(field)
- write into .fN files
- remove .nrm file
- modify segmentInfo to know that it has no .nrm file.

(5) It would have been more efficient to optimize (and remove the .nrm file) 
once in the application, so perhaps modify the public API to take an array of 
fields and operate on all?


> Field norm modifier (CLI tool)
> ------------------------------
>
>                 Key: LUCENE-741
>                 URL: https://issues.apache.org/jira/browse/LUCENE-741
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Otis Gospodnetic
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>         Attachments: LUCENE-741.patch, LUCENE-741.patch
>
>
> I took Chris' LengthNormModifier (contrib/misc) and modified it slightly, to 
> allow us to set fake norms on an existing fields, effectively making it 
> equivalent to Field.Index.NO_NORMS.
> This is related to LUCENE-448 (NO_NORMS patch) and LUCENE-496 
> (LengthNormModifier contrib from Chris).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to