[ https://issues.apache.org/jira/browse/LUCENE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837793#action_12837793 ]
Tim Smith commented on LUCENE-2283: ----------------------------------- I came across this issue looking for a reported memory leak during indexing a yourkit snapshot showed that the PerDocs for an IndexWriter were using ~40M of memory (at which point i came across this potentially unbounded memory use in StoredFieldsWriter) this snapshot seems more or less at a stable point (memory grows but then returns to a "normal" state), however i have reports that eventually the memory is completely exhausted resulting in out of memory errors. I so far have not found any other major culprit in the lucene indexing code. This index receives a routine mix of very large and very small documents (which would explain this situation) The VM and system have more than ample amount of memory given the buffer size and what should be normal indexing RAM requirements. Also, a major difference between this leak not occurring and it showing up is that previously, the IndexWriter was closed when performing commits, now the IndexWriter remains open (just calling IndexWriter.commit()). So, if any memory is leaking during indexing, it is no longer being reclaimed during commit. As a side note, closing the index writer at commit time would sometimes fail, resulting in some following updates to fail because the index writer was locked and couldn't be reopened until the old index writer was garbage collected, so i don't want to go back to this for commits. Its possible there is a leak somewhere else (i currently do not have a snapshot right before out of memory issues occur, so currently the only thing that stands out is the PerDoc memory use) As far as a fix goes, wouldn't it be better to have the RAMFile's used for stored fields pull and return byte buffers from the byte block pool on the DocumentsWriter? This would allow the memory to be reclaimed based on the index writers buffer size (otherwise there is no configurable way to tune this memory use) > Possible Memory Leak in StoredFieldsWriter > ------------------------------------------ > > Key: LUCENE-2283 > URL: https://issues.apache.org/jira/browse/LUCENE-2283 > Project: Lucene - Java > Issue Type: Bug > Affects Versions: 2.4.1 > Reporter: Tim Smith > Assignee: Michael McCandless > Fix For: 3.1 > > > StoredFieldsWriter creates a pool of PerDoc instances > this pool will grow but never be reclaimed by any mechanism > furthermore, each PerDoc instance contains a RAMFile. > this RAMFile will also never be truncated (and will only ever grow) (as far > as i can tell) > When feeding documents with large number of stored fields (or one large > dominating stored field) this can result in memory being consumed in the > RAMFile but never reclaimed. Eventually, each pooled PerDoc could grow very > large, even if large documents are rare. > Seems like there should be some attempt to reclaim memory from the PerDoc[] > instance pool (or otherwise limit the size of RAMFiles that are cached) etc -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org