OutOfMemoryError when re-indexing the repository
In-Reply-To: <[EMAIL PROTECTED]>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

    [ 
http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12446854 ] 
            
Jukka Zitting commented on JCR-550:
-----------------------------------

I would assume that the OutOfMemoryException is triggered by the parsing of 
some large Word document, especially since you reported that the problem does 
not occur if you disable the Word document filter.

Thus, if we catch the OutOfMemoryException caused by a single document, it will 
should not interrupt the whole indexing process. Any memory garbage should then 
get collected automatically unless the document parser stores information 
statically.

> ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository
> ------------------------------------------------------------------------------
>
>                 Key: JCR-550
>                 URL: http://issues.apache.org/jira/browse/JCR-550
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: indexing
>    Affects Versions: 1.0.1
>         Environment: tomcat 5.0 [256 up to 512 mb of ram] 
> jackrabbit 1.0.1 
> jdk 1.4.2_12 
> Intel Xeon 3.2GHz with 2Gb of memory
> ----
> poi-3.0-alpha2-20060616.jar
> poi-contrib-3.0-alpha2-20060616.jar
> poi-scratchpad-3.0-alpha2-20060616.jar
> jackrabbit-core-1.0.1.jar
> jackrabbit-index-filters-1.0.1.jar
> jackrabbit-jcr-commons-1.0.1.jar
> jcr-1.0.jar
> tm-extractors-0.4.jar
> lucene-1.4.3.jar
>            Reporter: Christian Zanata
>         Assigned To: Marcel Reutegger
>         Attachments: log_files.zip
>
>
> [ERROR] 20060825 17:06:40
> (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
> Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError
> when we try to re-index a repository, the repository is quite big (more then 
> 4 Gb of disk usage) and sometimes it stores 40Mb size documents.
> As attach I put all the last logs we registered, with the full stack traces.
> Related to this whe have also errors with Lucene:
> [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader)
> - Dump: 
> java.io.IOException: Invalid header signature; read 8656037701166316554,
> expected -2226271756974174256
>         at org.apache.jackrabbit.core.query.MsWordTextFilter
> and then this ones:
> [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) -
> removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache
> [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) -
> Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was
> not shut down properly.
> [ERROR] 20060803 09:33:14
> (org.apache.jackrabbit.core.observation.ObservationManagerFactory) -
> Synchronous EventConsumer threw exception.
> java.lang.NullPointerException: null values not allowed
> this is our repository.xml configuration for indexing
> <SearchIndex
> class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>         <param name="path" value="${wsp.home}/index"/>
>         <param name="textFilterClasses"
> value="org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter,
> org.apache.jackrabbit.core.query.MsExcelTextFilter,
> org.apache.jackrabbit.core.query.MsPowerPointTextFilter, 
> org.apache.jackrabbit.core.query.MsWordTextFilter,
> org.apache.jackrabbit.core.query.PdfTextFilter,
> org.apache.jackrabbit.core.query.HTMLTextFilter,
> org.apache.jackrabbit.core.query.XMLTextFilter,
> org.apache.jackrabbit.core.query.RTFTextFilter,
>                         
> org.apache.jackrabbit.core.query.OpenOfficeTextFilter"/>
>         <param name="useCompoundFile" value="true"/>
>         <param name="minMergeDocs" value="100"/>
>         <param name="volatileIdleTime" value="3"/>
>         <param name="maxMergeDocs" value="100000"/>
>         <param name="mergeFactor" value="10"/>
>         <param name="bufferSize" value="10"/>
>         <param name="cacheSize" value="1000"/>
>         <param name="forceConsistencyCheck" value="false"/>
>         <param name="autoRepair" value="true"/>
>                 <param name="respectDocumentOrder" value="false"/>
>         <param name="analyzer"
> value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
> </SearchIndex>

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to