[ 
https://issues.apache.org/jira/browse/LUCENE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748091#action_12748091
 ] 

Tim Smith edited comment on LUCENE-1859 at 8/26/09 12:18 PM:
-------------------------------------------------------------

i fail to see the complexity of adding one method to TermAttribute:
{code}
public void shrinkBuffer(int maxSize) {
  if ((maxSize > termLength) && (buffer.length > maxSize)) {
    termBuffer = java.util.Arrays.copyOf(termBuffer, maxSize);
  } 
}
{code}

Not having this is fine as long as its well documented that emitting large 
tokens can and will result in memory growing uncontrolled (especially if using 
many indexing threads)

      was (Author: tsmith):
    i fail to see the complexity of adding one method to TermAttribute:
{code}
public void shrinkBuffer(int maxSize) {
  if ((maxSize > termLength) && (buffer.length > maxSize)) {
    termBuffer = new char[maxSize];
  } 
}
{code}

Not having this is fine as long as its well documented that emitting large 
tokens can and will result in memory growing uncontrolled (especially if using 
many indexing threads)
  
> TermAttributeImpl's buffer will never "shrink" if it grows too big
> ------------------------------------------------------------------
>
>                 Key: LUCENE-1859
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1859
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Tim Smith
>            Priority: Minor
>
> This was also an issue with Token previously as well
> If a TermAttributeImpl is populated with a very long buffer, it will never be 
> able to reclaim this memory
> Obviously, it can be argued that Tokenizer's should never emit "large" 
> tokens, however it seems that the TermAttributeImpl should have a reasonable 
> static "MAX_BUFFER_SIZE" such that if the term buffer grows bigger than this, 
> it will shrink back down to this size once the next token smaller than 
> MAX_BUFFER_SIZE is set
> I don't think i have actually encountered issues with this yet, however it 
> seems like if you have multiple indexing threads, you could end up with a 
> char[Integer.MAX_VALUE] per thread (in the very worst case scenario)
> perhaps growTermBuffer should have the logic to shrink if the buffer is 
> currently larger than MAX_BUFFER_SIZE and it needs less than MAX_BUFFER_SIZE

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to