[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never "shrink" if it grows too big

Marvin Humphrey (JIRA) Wed, 26 Aug 2009 12:56:25 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748109#action_12748109
 ]


Marvin Humphrey commented on LUCENE-1859:
-----------------------------------------

> I don't believe there is ever any valid argument against adding
> documentation.

The more that documentation grows, the harder it is to absorb.  The more
bells and whistles on an API, the harder it is to grok and to use effectively.
The more a code base bloats, the harder it is to maintain or to evolve.

> keeping average memory usage down prevents those wonderful OutOfMemory
> Exceptions

No, it won't.  If someone is emitting large tokens regularly, it is likely
that several threads will require large RAM footprints simultaneously, and an
OOM will occur.  That would be the common case.

If someone is emmitting large tokens periodically, well, this doesn't prevent
the OOM, it just makes it less likely.  That's not worthless, but it's not
something anybody should count on when assessing required RAM usage.

Keeping average memory usage down is good for the system at large.  If this is
implemented, that should be the justification.


> TermAttributeImpl's buffer will never "shrink" if it grows too big
> ------------------------------------------------------------------
>
>                 Key: LUCENE-1859
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1859
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Tim Smith
>            Priority: Minor
>
> This was also an issue with Token previously as well
> If a TermAttributeImpl is populated with a very long buffer, it will never be 
> able to reclaim this memory
> Obviously, it can be argued that Tokenizer's should never emit "large" 
> tokens, however it seems that the TermAttributeImpl should have a reasonable 
> static "MAX_BUFFER_SIZE" such that if the term buffer grows bigger than this, 
> it will shrink back down to this size once the next token smaller than 
> MAX_BUFFER_SIZE is set
> I don't think i have actually encountered issues with this yet, however it 
> seems like if you have multiple indexing threads, you could end up with a 
> char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
> perhaps growTermBuffer should have the logic to shrink if the buffer is 
> currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1859) TermAttributeImpl's buffer will never "shrink" if it grows too big

Reply via email to