[jira] [Commented] (LUCENE-4398) "Memory Leak" in TermsHashPerField memory tracking

Tim Smith (JIRA) Mon, 17 Sep 2012 11:39:09 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457214#comment-13457214
 ]


Tim Smith commented on LUCENE-4398:
-----------------------------------

Found a easy "fix" for this:
commenting out the "bytesUsed(postingsHashSize * 
RamUsageEstimator.NUM_BYTES_INT)" line from TermsHashPerField's constructor 
does the trick

This results in not accounting for 16 bytes for each field for each thread, 
this being the same 16 bytes that were not being reclaimed by trimFields()

I suppose a more robust means to fix this would be to add a "destroy()" method 
to the PerField interfaces that would release this memory (however that would 
be a rather large patch)


Also found a relatively easy way to reproduce this:
Feed N documents with fields A-M
force flush
Feed N documents with fields N-Z
force flush
Repeat

it will take a long time to actually consume all the memory (more fields used 
in test should accelerate things)








                
> "Memory Leak" in TermsHashPerField memory tracking
> --------------------------------------------------
>
>                 Key: LUCENE-4398
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4398
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 3.4
>            Reporter: Tim Smith
>
> I am witnessing an apparent leak in the memory tracking used to determine 
> when a flush is necessary.
> Over time, this will result in every single document being flushed into its 
> own segment as the memUsage will remain above the configured buffer size, 
> causing a flush to be triggered after every add/update.
> Best I can figure, this is being caused by TermsHashPerField's tracking of 
> memory usage for postingsHash and/or postingsArray combined with 
> multi-threaded feeding.
> I suspect that the TermsHashPerField's postingsHash is growing in one thread, 
> then, when a segment is flushed, a single, different thread will merge all 
> TermsHashPerFields in FreqProxTermsWriter and then call shrinkHash(). I 
> suspect this call of shrinkHash() is seeing an old postingsHash array, and 
> subsequently not releasing all the memory that was allocated.
> If this is the case, I am also concerned that FreqProxTermsWriter will not 
> write the correct terms into the index, although I have not confirmed that 
> any indexing problem occurs as of yet.
> NOTE: i am witnessing this growth in a test by subtracting the amount or 
> memory allocated (but in a "free" state) by 
> perDocAllocator/byteBlockAllocator/charBlocks/intBlocks from 
> DocumentsWriter.memUsage.get() in IndexWriter.doAfterFlush()
> I will see this stay at a stable point for a while, then on some flushes, i 
> will see this grow by a couple of bytes, and all subsequent flushes will 
> never go back down the the previous state
> I will continue to investigate and post any additional findings

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4398) "Memory Leak" in TermsHashPerField memory tracking

Reply via email to