[ 
https://issues.apache.org/jira/browse/SOLR-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15539674#comment-15539674
 ] 

Alexandre Rafalovitch commented on SOLR-2886:
---------------------------------------------

Does this happen with the latest version of Solr/Tika? If not or cannot be 
reproduced, I suggest closing the case.

> Out of Memory Error with DIH and TikaEntityProcessor
> ----------------------------------------------------
>
>                 Key: SOLR-2886
>                 URL: https://issues.apache.org/jira/browse/SOLR-2886
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler, contrib - Solr Cell (Tika 
> extraction)
>    Affects Versions: 4.0-ALPHA
>            Reporter: Tricia Jenkins
>
> I've recently upgraded from apache-solr-4.0-2011-06-14_08-33-23.war to 
> apache-solr-4.0-2011-10-14_08-56-59.war and then 
> apache-solr-4.0-2011-10-30_09-00-00.war to index ~5300 pdfs, of various 
> sizes, using the TikaEntityProcessor.  My indexing would run to completion 
> and was completely successful under the June build.  The only error was 
> readability of the fulltext in highlighting.  This was fixed in Tika 0.10 
> (TIKA-611).  I chose to use the October 14 build of Solr because Tika 0.10 
> had recently been included (SOLR-2372).  
> On the same machine without changing any memory settings my initial problem 
> is a Perm Gen error.  Fine, I increase the PermGen space.
> I've set the "onError" parameter to "skip" for the TikaEntityProcessor.  Now 
> I get several (6)
> SEVERE: Exception thrown while getting data
> java.net.SocketTimeoutException: Read timed out
> SEVERE: Exception in entity : 
> tika:org.apache.solr.handler.dataimport.DataImport
> HandlerException: Exception in invoking url <url removed> # 2975
> pairs.  And after ~3881 documents, with auto commit set unreasonably 
> frequently I consistently get an Out of Memory Error 
> SEVERE: Exception while processing: f document : 
> null:org.apache.solr.handler.dataimport.DataImportHandlerException: 
> java.lang.OutOfMemoryError: Java heap space
> The stack trace points to 
> org.apache.pdfbox.io.RandomAccessBuffer.expandBuffer(RandomAccessBuffer.java:151)
>  and 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:718).
> The October 30 build performs identically.
> Funny thing is that monitoring via JConsole doesn't reveal any memory issues.
> Because the out of Memory error did not occur in June, this leads me to 
> believe that a bug has been introduced to the code since then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to