[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.

2010-04-08 Thread Ruben Laguna (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854901#action_12854901
 ] 

Ruben Laguna commented on LUCENE-2384:
--

The mailing list discussion that originated this is [1]


[1] http://lucene.markmail.org/thread/ndmcgffg2mnwjo47



 Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
 -

 Key: LUCENE-2384
 URL: https://issues.apache.org/jira/browse/LUCENE-2384
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Analysis
Affects Versions: 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


 When indexing large documents, the lexer buffer may stay large forever. This 
 sub-issue resets the lexer buffer back to the default on reset(Reader).
 This is done on the enclosing issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.

2010-04-08 Thread Ruben Laguna (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruben Laguna updated LUCENE-2384:
-

Attachment: reset.diff

patch to reset the zzBuffer when the input is reseted. The code is really taken 
from 
https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com
  so I can't really grant license to use it but I think the guy realeased it as 
public domain by posting it to the mailing list

 Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
 -

 Key: LUCENE-2384
 URL: https://issues.apache.org/jira/browse/LUCENE-2384
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Analysis
Affects Versions: 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: reset.diff


 When indexing large documents, the lexer buffer may stay large forever. This 
 sub-issue resets the lexer buffer back to the default on reset(Reader).
 This is done on the enclosing issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.

2010-04-08 Thread Ruben Laguna (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854905#action_12854905
 ] 

Ruben Laguna edited comment on LUCENE-2384 at 4/8/10 11:24 AM:
---

patch to reset the zzBuffer when the input is reseted. The code is really taken 
from 
https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com
  so I can't really grant license to use it but I think the guy realeased it as 
public domain by posting it to the mailing list. 

I tested it and it seems to work for me. Just including it here is case 
somebody want to apply the patch directly to 3.0.1 (although it's better to 
wait for 3.1)

  was (Author: ecerulm):
patch to reset the zzBuffer when the input is reseted. The code is really 
taken from 
https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com
  so I can't really grant license to use it but I think the guy realeased it as 
public domain by posting it to the mailing list
  
 Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
 -

 Key: LUCENE-2384
 URL: https://issues.apache.org/jira/browse/LUCENE-2384
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Analysis
Affects Versions: 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: reset.diff


 When indexing large documents, the lexer buffer may stay large forever. This 
 sub-issue resets the lexer buffer back to the default on reset(Reader).
 This is done on the enclosing issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)

2010-04-08 Thread Ruben Laguna (JIRA)
IndexWriter retains references to Readers used in Fields (memory leak)
--

 Key: LUCENE-2387
 URL: https://issues.apache.org/jira/browse/LUCENE-2387
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0.1
Reporter: Ruben Laguna


As described in [1] IndexWriter retains references to Reader used in Fields and 
that can lead to big memory leaks when using tika's ParsingReaders (as those 
can take 1MB per ParsingReader). 

[2] shows a screenshot of the reference chain to the Reader from the 
IndexWriter taken with Eclipse MAT (Memory Analysis Tool) . The chain is the 
following:

IndexWriter - DocumentsWriter - DocumentsWriterThreadState - 
DocFieldProcessorPerThread  - DocFieldProcessorPerField - Fieldable - Field 
(fieldsData) 


-
[1] http://markmail.org/thread/ndmcgffg2mnwjo47
[2] http://skitch.com/ecerulm/n7643/eclipse-memory-analyzer



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org