[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
[ https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854901#action_12854901 ] Ruben Laguna commented on LUCENE-2384: -- The mailing list discussion that originated this is [1] [1] http://lucene.markmail.org/thread/ndmcgffg2mnwjo47 Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
[ https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruben Laguna updated LUCENE-2384: - Attachment: reset.diff patch to reset the zzBuffer when the input is reseted. The code is really taken from https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com so I can't really grant license to use it but I think the guy realeased it as public domain by posting it to the mailing list Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: reset.diff When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
[ https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854905#action_12854905 ] Ruben Laguna edited comment on LUCENE-2384 at 4/8/10 11:24 AM: --- patch to reset the zzBuffer when the input is reseted. The code is really taken from https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com so I can't really grant license to use it but I think the guy realeased it as public domain by posting it to the mailing list. I tested it and it seems to work for me. Just including it here is case somebody want to apply the patch directly to 3.0.1 (although it's better to wait for 3.1) was (Author: ecerulm): patch to reset the zzBuffer when the input is reseted. The code is really taken from https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com so I can't really grant license to use it but I think the guy realeased it as public domain by posting it to the mailing list Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: reset.diff When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)
IndexWriter retains references to Readers used in Fields (memory leak) -- Key: LUCENE-2387 URL: https://issues.apache.org/jira/browse/LUCENE-2387 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0.1 Reporter: Ruben Laguna As described in [1] IndexWriter retains references to Reader used in Fields and that can lead to big memory leaks when using tika's ParsingReaders (as those can take 1MB per ParsingReader). [2] shows a screenshot of the reference chain to the Reader from the IndexWriter taken with Eclipse MAT (Memory Analysis Tool) . The chain is the following: IndexWriter - DocumentsWriter - DocumentsWriterThreadState - DocFieldProcessorPerThread - DocFieldProcessorPerField - Fieldable - Field (fieldsData) - [1] http://markmail.org/thread/ndmcgffg2mnwjo47 [2] http://skitch.com/ecerulm/n7643/eclipse-memory-analyzer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org