OutOfMemoryError when re-indexing the repository
In-Reply-To: <[EMAIL PROTECTED]>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

    [ 
http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12431955 ] 
            
Marcel Reutegger commented on JCR-550:
--------------------------------------

To reproduce this issue I tried to re-index a repository with 100'000 nodes. I 
was able to re-index the repository with as little as 32 mb heap size. My 
profiler did not show any exceptional memory usage in the search index. The 
memory usage was actually quite low.

Can you please try to re-index your repository without the text filters? Maybe 
there is a memory leak in one of the filters when an exception is thrown on an 
invalid or corrupt document.

Having a heap dump for analysis would also be helpful. Can you please run the 
re-indexing process with the following JVM option: -Xrunhprof:heap=sites,doe=n
This will allow you to create a heap dump on a Ctrl-Break (on Windows) or kill 
-QUIT (on Unix) on the JVM process.

Thanks a lot.

> ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
> ------------------------------------------------------------------------------
>
>                 Key: JCR-550
>                 URL: http://issues.apache.org/jira/browse/JCR-550
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: indexing
>    Affects Versions: 1.0.1
>         Environment: tomcat 5.0 [256 up to 512 mb of ram] 
> jackrabbit 1.0.1 
> jdk 1.4.2_12 
> Intel Xeon 3.2GHz with 2Gb of memory
> ----
> poi-3.0-alpha2-20060616.jar
> poi-contrib-3.0-alpha2-20060616.jar
> poi-scratchpad-3.0-alpha2-20060616.jar
> jackrabbit-core-1.0.1.jar
> jackrabbit-index-filters-1.0.1.jar
> jackrabbit-jcr-commons-1.0.1.jar
> jcr-1.0.jar
> tm-extractors-0.4.jar
> lucene-1.4.3.jar
>            Reporter: Christian Zanata
>         Assigned To: Marcel Reutegger
>         Attachments: log_files.zip
>
>
> [ERROR] 20060825 17:06:40
> (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
> Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
> when we try to re-index a repository, the repository is quite big (more then 
> 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
> As attach I put all the last logs we registered, with the full stack traces.
> Related to this whe have also errors with Lucene:
> [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
> - Dump: 
> java.io.IOException: Invalid header signature; read 8656037701166316554,
> expected -2226271756974174256
>         at org.apache.jackrabbit.core.query.MsWordTextFilter
> and then this ones:
> [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
> removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
> [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
> Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
> not shut down properly.
> [ERROR] 20060803 09:33:14
> (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
> Synchronous EventConsumer threw exception.
> java.lang.NullPointerException: null values not allowed
> this is our repository.xml configuration for indexing
> <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>         <param name="path" value="${wsp.home}/index"/>
>         <param name="textFilterClasses"
> value="org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
> org.apache.jackrabbit.core.query.MsExcelTextFilter,
> org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
> org.apache.jackrabbit.core.query.MsWordTextFilter,
> org.apache.jackrabbit.core.query.PdfTextFilter,
> org.apache.jackrabbit.core.query.HTMLTextFilter,
> org.apache.jackrabbit.core.query.XMLTextFilter,
> org.apache.jackrabbit.core.query.RTFTextFilter,
>                         
> org.apache.jackrabbit.core.query.OpenOfficeTextFilter"/>
>         <param name="useCompoundFile" value="true"/>
>         <param name="minMergeDocs" value="100"/>
>         <param name="volatileIdleTime" value="3"/>
>         <param name="maxMergeDocs" value="100000"/>
>         <param name="mergeFactor" value="10"/>
>         <param name="bufferSize" value="10"/>
>         <param name="cacheSize" value="1000"/>
>         <param name="forceConsistencyCheck" value="false"/>
>         <param name="autoRepair" value="true"/>
>                 <param name="respectDocumentOrder" value="false"/>
>         <param name="analyzer"
> value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
> </SearchIndex>

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to