[ https://issues.apache.org/jira/browse/LUCENE-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe Schindler closed LUCENE-505. -------------------------------- Resolution: Fixed Fix Version/s: 2.9 Since Lucene 2.9 we search on each segment separately, so MultiReader's norms cache would never be used, exept in custom code that calls norms() on the MultiReader/DirectoryReader. Since Lucene 4.0 this is also not allowed anymore, non-atomic readers don't support norms. If you still need to get global norms, you can use MultiNorms but that is discouraged. See also: LUCENE-2771 > MultiReader.norm() takes up too much memory: norms byte[] should be made into > an Object > --------------------------------------------------------------------------------------- > > Key: LUCENE-505 > URL: https://issues.apache.org/jira/browse/LUCENE-505 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.0.0 > Environment: Patch is against Lucene 1.9 trunk (as of Mar 1 06) > Reporter: Steven Tamm > Priority: Minor > Fix For: 2.9 > > Attachments: LazyNorms.patch, NormFactors.patch, NormFactors.patch, > NormFactors20.patch > > > MultiReader.norms() is very inefficient: it has to construct a byte array > that's as long as all the documents in every segment. This doubles the > memory requirement for scoring MultiReaders vs. Segment Readers. Although > this is cached, it's still a baseline of memory that is unnecessary. > The problem is that the Normalization Factors are passed around as a byte[]. > If it were instead replaced with an Object, you could perform a whole host of > optimizations > a. When reading, you wouldn't have to construct a "fakeNorms" array of all > 1.0fs. You could instead return a singleton object that would just return > 1.0f. > b. MultiReader could use an object that could delegate to NormFactors of the > subreaders > c. You could write an implementation that could use mmap to access the norm > factors. Or if the index isn't long lived, you could use an implementation > that reads directly from the disk. > The patch provided here replaces the use of byte[] with a new abstract class > called NormFactors. > NormFactors has two methods on it > public abstract byte getByte(int doc) throws IOException; // Returns the > byte[doc] > public float getFactor(int doc) throws IOException; // Calls > Similarity.decodeNorm(getByte(doc)) > There are four implementations of this abstract class > 1. NormFactors.EmptyNormFactors - This replaces the fakeNorms with a > singleton that only returns 1.0 > 2. NormFactors.ByteNormFactors - Converts a byte[] to a NormFactors for > backwards compatibility in constructors. > 3. MultiNormFactors - Multiplexes the NormFactors in MultiReader to prevent > the need to construct the gigantic norms array. > 4. SegmentReader.Norm - Same class, but now extends NormFactors to provide > the same access. > In addition, Many of the Query and Scorer classes were changes to pass around > NormFactors instead of byte[], and to call getFactor() instead of using the > byte[]. I have kept around IndexReader.norms(String) for backwards > compatibiltiy, but marked it as deprecated. I believe that the use of > ByteNormFactors in IndexReader.getNormFactors() will keep backward > compatibility with other IndexReader implementations, but I don't know how to > test that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org