[ https://issues.apache.org/jira/browse/LUCENE-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe Schindler updated LUCENE-1520: ---------------------------------- Attachment: LUCENE-1520.patch This is a patch for Mike's suggestion: It just fixes CheckIndex to not use norms(fieldname) which caches, but uses the uncached 3-arg variant. TestCheckIndex passes. No more OOM error with the many-field-index. > OOM erros with CheckIndex with indexes containg a lot of fields with norms > -------------------------------------------------------------------------- > > Key: LUCENE-1520 > URL: https://issues.apache.org/jira/browse/LUCENE-1520 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Affects Versions: 2.9 > Reporter: Uwe Schindler > Attachments: LUCENE-1520.patch > > > All index readers have a cache of the last used norms (SegmentReader, > MultiReader, MultiSegmentReader,...). This cache is never cleaned up, so if > you access norms of a field, the norm's byte[maxdoc()] array is not freed > until you close/reopen the index. > You can see this problem, if you create an index with many fields with norms > (I tested with about 4,000 fields) and many documents (half a million). If > you then call CheckIndex, that calls norms() for each (!) field in the > Segment and each of this calls creates a new cache entry, you get > OutOfMemoryExceptions after short time (I tested with the above index: I was > not able to do a CheckIndex even with "-Xmx 16GB" on 64bit Java). > CheckIndex opens and then tests each segment of a index with a separate > SegmentReader. The big index with the OutOfMemory problem was optimized, so > consisting of one segment with about half a million docs and about 4,000 > fields. Each byte[] array takes about a half MiB for this index. The > CheckIndex funtion created the norm for 4000 fields and the SegmentReader > cached them, which is about 2 GiB RAM. So OOMs are not unusal. > In my opinion, the best would be to use a Weak- or better a SoftReference so > norms.bytes gets java.lang.ref.SoftReference<byte[]> and used for caching. > With proper synchronization (which is done on the norms cache in > SegmentReader) you can do the best with SoftReference, as this reference is > garbage collected only when an OOM may happen. If the byte[] array is freed > (but it is only freed if no other references exist), a lter call to > getNorms() creates a new array. When code is hard referencing the norms > array, it will not be freed, so no problem. The same could be done for the > other IndexReaders. > Fields without norm() do not have this problem, as all these fields share a > one-time allocated dummy norm array. So the same index without norms enabled > for most of the fields checked perfectly. > I will prepare a patch tomorrow. > Mike proposed another quick fix for CheckIndex: > bq. we could do something first specifically for CheckIndex (eg it could > simply use the 3-arg non-caching bytes method instead) to prevent OOM errors > when using it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org