[ 
https://issues.apache.org/jira/browse/LUCENE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837793#action_12837793
 ] 

Tim Smith commented on LUCENE-2283:
-----------------------------------

I came across this issue looking for a reported memory leak during indexing

a yourkit snapshot showed that the PerDocs for an IndexWriter were using ~40M 
of memory (at which point i came across this potentially unbounded memory use 
in StoredFieldsWriter)
this snapshot seems more or less at a stable point (memory grows but then 
returns to a "normal" state), however i have reports that eventually the memory 
is completely exhausted resulting in out of memory errors.

I so far have not found any other major culprit in the lucene indexing code.

This index receives a routine mix of very large and very small documents (which 
would explain this situation)
The VM and system have more than ample amount of memory given the buffer size 
and what should be normal indexing RAM requirements.

Also, a major difference between this leak not occurring and it showing up is 
that previously, the IndexWriter was closed when performing commits, now the 
IndexWriter remains open (just calling IndexWriter.commit()). So, if any memory 
is leaking during indexing, it is no longer being reclaimed during commit. As a 
side note, closing the index writer at commit time would sometimes fail, 
resulting in some following updates to fail because the index writer was locked 
and couldn't be reopened until the old index writer was garbage collected, so i 
don't want to go back to this for commits.

Its possible there is a leak somewhere else (i currently do not have a snapshot 
right before out of memory issues occur, so currently the only thing that 
stands out is the PerDoc memory use)

As far as a fix goes, wouldn't it be better to have the RAMFile's used for 
stored fields pull and return byte buffers from the byte block pool on the 
DocumentsWriter? This would allow the memory to be reclaimed based on the index 
writers buffer size (otherwise there is no configurable way to tune this memory 
use)



> Possible Memory Leak in StoredFieldsWriter
> ------------------------------------------
>
>                 Key: LUCENE-2283
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2283
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 2.4.1
>            Reporter: Tim Smith
>            Assignee: Michael McCandless
>             Fix For: 3.1
>
>
> StoredFieldsWriter creates a pool of PerDoc instances
> this pool will grow but never be reclaimed by any mechanism
> furthermore, each PerDoc instance contains a RAMFile.
> this RAMFile will also never be truncated (and will only ever grow) (as far 
> as i can tell)
> When feeding documents with large number of stored fields (or one large 
> dominating stored field) this can result in memory being consumed in the 
> RAMFile but never reclaimed. Eventually, each pooled PerDoc could grow very 
> large, even if large documents are rare.
> Seems like there should be some attempt to reclaim memory from the PerDoc[] 
> instance pool (or otherwise limit the size of RAMFiles that are cached) etc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to