[jira] Commented: (JCR-550) ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository In-Reply-To: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable [ http://issues.apache.org/jira/browse/JCR-550?page=3Dcomments#action_1= 2446852 ]=20 =20 Claus K=C3=B6ll commented on JCR-550: Hi Jukka .. The issue JCR-574 is very different to this issue. The Problem was that the= LazyReader has only catched Exceptions not Runtime Exceptions. The Problem here is that i get a OutOfMemoryException while re-indexing a h= uge Repository. This is for me a very big problem because i can not work in a Production en= vironment with Jackrabbit because we=20 have about 4-5 million documents (doc,xls,pdf). If i have to re-index the r= epsoitory i can not to this. I will try the vm-argument what marcel wrote. claus ObservationManagerFactory) - OutOfMemoryError when re-indexing the repository -= - Key: JCR-550 URL: http://issues.apache.org/jira/browse/JCR-550 Project: Jackrabbit Issue Type: Bug Components: indexing Affects Versions: 1.0.1 Environment: tomcat 5.0 [256 up to 512 mb of ram]=20 jackrabbit 1.0.1=20 jdk 1.4.2_12=20 Intel Xeon 3.2GHz with 2Gb of memory poi-3.0-alpha2-20060616.jar poi-contrib-3.0-alpha2-20060616.jar poi-scratchpad-3.0-alpha2-20060616.jar jackrabbit-core-1.0.1.jar jackrabbit-index-filters-1.0.1.jar jackrabbit-jcr-commons-1.0.1.jar jcr-1.0.jar tm-extractors-0.4.jar lucene-1.4.3.jar Reporter: Christian Zanata Assigned To: Marcel Reutegger Attachments: log_files.zip [ERROR] 20060825 17:06:40 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError when we try to re-index a repository, the repository is quite big (more t= hen 4 Gb of disk usage) and sometimes it stores 40Mb size documents. As attach I put all the last logs we registered, with the full stack trac= es. Related to this whe have also errors with Lucene: [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader) - Dump:=20 java.io.IOException: Invalid header signature; read 8656037701166316554, expected -2226271756974174256 at org.apache.jackrabbit.core.query.MsWordTextFilter and then this ones: [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) - removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) - Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was not shut down properly. [ERROR] 20060803 09:33:14 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.NullPointerException: null values not allowed this is our repository.xml configuration for indexing SearchIndex class=3Dorg.apache.jackrabbit.core.query.lucene.SearchIndex param name=3Dpath value=3D${wsp.home}/index/ param name=3DtextFilterClasses value=3Dorg.apache.jackrabbit.core.query.lucene.TextPlainTextFilter, org.apache.jackrabbit.core.query.MsExcelTextFilter, org.apache.jackrabbit.core.query.MsPowerPointTextFilter,=20 org.apache.jackrabbit.core.query.MsWordTextFilter, org.apache.jackrabbit.core.query.PdfTextFilter, org.apache.jackrabbit.core.query.HTMLTextFilter, org.apache.jackrabbit.core.query.XMLTextFilter, org.apache.jackrabbit.core.query.RTFTextFilter, org.apache.jackrabbit.core.query.OpenOfficeTextFi= lter/ param name=3DuseCompoundFile value=3Dtrue/ param name=3DminMergeDocs value=3D100/ param name=3DvolatileIdleTime value=3D3/ param name=3DmaxMergeDocs value=3D10/ param name=3DmergeFactor value=3D10/ param name=3DbufferSize value=3D10/ param name=3DcacheSize value=3D1000/ param name=3DforceConsistencyCheck value=3Dfalse/ param name=3DautoRepair value=3Dtrue/ param name=3DrespectDocumentOrder value=3Dfalse/ param name=3Danalyzer value=3Dorg.apache.lucene.analysis.standard.StandardAnalyzer/ /SearchIndex --=20 This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: htt= p://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-550) ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository In-Reply-To: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit [ http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12446831 ] Jukka Zitting commented on JCR-550: --- Does this issue still occur now that RuntimeExceptions are being catched per the JCR-574 fix? ObservationManagerFactory) - OutOfMemoryError when re-indexing the repository -- Key: JCR-550 URL: http://issues.apache.org/jira/browse/JCR-550 Project: Jackrabbit Issue Type: Bug Components: indexing Affects Versions: 1.0.1 Environment: tomcat 5.0 [256 up to 512 mb of ram] jackrabbit 1.0.1 jdk 1.4.2_12 Intel Xeon 3.2GHz with 2Gb of memory poi-3.0-alpha2-20060616.jar poi-contrib-3.0-alpha2-20060616.jar poi-scratchpad-3.0-alpha2-20060616.jar jackrabbit-core-1.0.1.jar jackrabbit-index-filters-1.0.1.jar jackrabbit-jcr-commons-1.0.1.jar jcr-1.0.jar tm-extractors-0.4.jar lucene-1.4.3.jar Reporter: Christian Zanata Assigned To: Marcel Reutegger Attachments: log_files.zip [ERROR] 20060825 17:06:40 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError when we try to re-index a repository, the repository is quite big (more then 4 Gb of disk usage) and sometimes it stores 40Mb size documents. As attach I put all the last logs we registered, with the full stack traces. Related to this whe have also errors with Lucene: [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader) - Dump: java.io.IOException: Invalid header signature; read 8656037701166316554, expected -2226271756974174256 at org.apache.jackrabbit.core.query.MsWordTextFilter and then this ones: [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) - removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) - Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was not shut down properly. [ERROR] 20060803 09:33:14 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.NullPointerException: null values not allowed this is our repository.xml configuration for indexing SearchIndex class=org.apache.jackrabbit.core.query.lucene.SearchIndex param name=path value=${wsp.home}/index/ param name=textFilterClasses value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter, org.apache.jackrabbit.core.query.MsExcelTextFilter, org.apache.jackrabbit.core.query.MsPowerPointTextFilter, org.apache.jackrabbit.core.query.MsWordTextFilter, org.apache.jackrabbit.core.query.PdfTextFilter, org.apache.jackrabbit.core.query.HTMLTextFilter, org.apache.jackrabbit.core.query.XMLTextFilter, org.apache.jackrabbit.core.query.RTFTextFilter, org.apache.jackrabbit.core.query.OpenOfficeTextFilter/ param name=useCompoundFile value=true/ param name=minMergeDocs value=100/ param name=volatileIdleTime value=3/ param name=maxMergeDocs value=10/ param name=mergeFactor value=10/ param name=bufferSize value=10/ param name=cacheSize value=1000/ param name=forceConsistencyCheck value=false/ param name=autoRepair value=true/ param name=respectDocumentOrder value=false/ param name=analyzer value=org.apache.lucene.analysis.standard.StandardAnalyzer/ /SearchIndex -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-550) ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository In-Reply-To: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit [ http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12446843 ] Marcel Reutegger commented on JCR-550: -- Claus wrote: is there another way to get a dump file ? Acutally there is. jdk 1.4.2-12 supports the option -XX:+HeapDumpOnOutOfMemoryError With this option the JVM will create a dump it goes out of memory. ObservationManagerFactory) - OutOfMemoryError when re-indexing the repository -- Key: JCR-550 URL: http://issues.apache.org/jira/browse/JCR-550 Project: Jackrabbit Issue Type: Bug Components: indexing Affects Versions: 1.0.1 Environment: tomcat 5.0 [256 up to 512 mb of ram] jackrabbit 1.0.1 jdk 1.4.2_12 Intel Xeon 3.2GHz with 2Gb of memory poi-3.0-alpha2-20060616.jar poi-contrib-3.0-alpha2-20060616.jar poi-scratchpad-3.0-alpha2-20060616.jar jackrabbit-core-1.0.1.jar jackrabbit-index-filters-1.0.1.jar jackrabbit-jcr-commons-1.0.1.jar jcr-1.0.jar tm-extractors-0.4.jar lucene-1.4.3.jar Reporter: Christian Zanata Assigned To: Marcel Reutegger Attachments: log_files.zip [ERROR] 20060825 17:06:40 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError when we try to re-index a repository, the repository is quite big (more then 4 Gb of disk usage) and sometimes it stores 40Mb size documents. As attach I put all the last logs we registered, with the full stack traces. Related to this whe have also errors with Lucene: [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader) - Dump: java.io.IOException: Invalid header signature; read 8656037701166316554, expected -2226271756974174256 at org.apache.jackrabbit.core.query.MsWordTextFilter and then this ones: [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) - removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) - Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was not shut down properly. [ERROR] 20060803 09:33:14 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.NullPointerException: null values not allowed this is our repository.xml configuration for indexing SearchIndex class=org.apache.jackrabbit.core.query.lucene.SearchIndex param name=path value=${wsp.home}/index/ param name=textFilterClasses value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter, org.apache.jackrabbit.core.query.MsExcelTextFilter, org.apache.jackrabbit.core.query.MsPowerPointTextFilter, org.apache.jackrabbit.core.query.MsWordTextFilter, org.apache.jackrabbit.core.query.PdfTextFilter, org.apache.jackrabbit.core.query.HTMLTextFilter, org.apache.jackrabbit.core.query.XMLTextFilter, org.apache.jackrabbit.core.query.RTFTextFilter, org.apache.jackrabbit.core.query.OpenOfficeTextFilter/ param name=useCompoundFile value=true/ param name=minMergeDocs value=100/ param name=volatileIdleTime value=3/ param name=maxMergeDocs value=10/ param name=mergeFactor value=10/ param name=bufferSize value=10/ param name=cacheSize value=1000/ param name=forceConsistencyCheck value=false/ param name=autoRepair value=true/ param name=respectDocumentOrder value=false/ param name=analyzer value=org.apache.lucene.analysis.standard.StandardAnalyzer/ /SearchIndex -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-550) ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository In-Reply-To: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable [ http://issues.apache.org/jira/browse/JCR-550?page=3Dcomments#action_1= 2433037 ]=20 =20 Claus K=C3=B6ll commented on JCR-550: hi @ all=20 in my case the most file types in my repository are word documents. if i remove the org.apache.jackrabbit.core.query.MsWordTextFilter class th= e re-index process works fine. but if i enable the filter the process ends with a outofmemory error. i think we must look for a memory leak ... claus ObservationManagerFactory) - OutOfMemoryError when re-indexing the repository -= - Key: JCR-550 URL: http://issues.apache.org/jira/browse/JCR-550 Project: Jackrabbit Issue Type: Bug Components: indexing Affects Versions: 1.0.1 Environment: tomcat 5.0 [256 up to 512 mb of ram]=20 jackrabbit 1.0.1=20 jdk 1.4.2_12=20 Intel Xeon 3.2GHz with 2Gb of memory poi-3.0-alpha2-20060616.jar poi-contrib-3.0-alpha2-20060616.jar poi-scratchpad-3.0-alpha2-20060616.jar jackrabbit-core-1.0.1.jar jackrabbit-index-filters-1.0.1.jar jackrabbit-jcr-commons-1.0.1.jar jcr-1.0.jar tm-extractors-0.4.jar lucene-1.4.3.jar Reporter: Christian Zanata Assigned To: Marcel Reutegger Attachments: log_files.zip [ERROR] 20060825 17:06:40 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError when we try to re-index a repository, the repository is quite big (more t= hen 4 Gb of disk usage) and sometimes it stores 40Mb size documents. As attach I put all the last logs we registered, with the full stack trac= es. Related to this whe have also errors with Lucene: [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader) - Dump:=20 java.io.IOException: Invalid header signature; read 8656037701166316554, expected -2226271756974174256 at org.apache.jackrabbit.core.query.MsWordTextFilter and then this ones: [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) - removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) - Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was not shut down properly. [ERROR] 20060803 09:33:14 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.NullPointerException: null values not allowed this is our repository.xml configuration for indexing SearchIndex class=3Dorg.apache.jackrabbit.core.query.lucene.SearchIndex param name=3Dpath value=3D${wsp.home}/index/ param name=3DtextFilterClasses value=3Dorg.apache.jackrabbit.core.query.lucene.TextPlainTextFilter, org.apache.jackrabbit.core.query.MsExcelTextFilter, org.apache.jackrabbit.core.query.MsPowerPointTextFilter,=20 org.apache.jackrabbit.core.query.MsWordTextFilter, org.apache.jackrabbit.core.query.PdfTextFilter, org.apache.jackrabbit.core.query.HTMLTextFilter, org.apache.jackrabbit.core.query.XMLTextFilter, org.apache.jackrabbit.core.query.RTFTextFilter, org.apache.jackrabbit.core.query.OpenOfficeTextFi= lter/ param name=3DuseCompoundFile value=3Dtrue/ param name=3DminMergeDocs value=3D100/ param name=3DvolatileIdleTime value=3D3/ param name=3DmaxMergeDocs value=3D10/ param name=3DmergeFactor value=3D10/ param name=3DbufferSize value=3D10/ param name=3DcacheSize value=3D1000/ param name=3DforceConsistencyCheck value=3Dfalse/ param name=3DautoRepair value=3Dtrue/ param name=3DrespectDocumentOrder value=3Dfalse/ param name=3Danalyzer value=3Dorg.apache.lucene.analysis.standard.StandardAnalyzer/ /SearchIndex --=20 This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: htt= p://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-550) ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository In-Reply-To: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable [ http://issues.apache.org/jira/browse/JCR-550?page=3Dcomments#action_1= 2432776 ]=20 =20 Claus K=C3=B6ll commented on JCR-550: hi marcel the vm argument -Xrunhprof:heap=3Dsites,doe=3Dn=20 does not work in my case. the re-index process stops after about 1-2 minute= s with a outofmemory-error is there another way to get a dump file ? claus ObservationManagerFactory) - OutOfMemoryError when re-indexing the repository -= - Key: JCR-550 URL: http://issues.apache.org/jira/browse/JCR-550 Project: Jackrabbit Issue Type: Bug Components: indexing Affects Versions: 1.0.1 Environment: tomcat 5.0 [256 up to 512 mb of ram]=20 jackrabbit 1.0.1=20 jdk 1.4.2_12=20 Intel Xeon 3.2GHz with 2Gb of memory poi-3.0-alpha2-20060616.jar poi-contrib-3.0-alpha2-20060616.jar poi-scratchpad-3.0-alpha2-20060616.jar jackrabbit-core-1.0.1.jar jackrabbit-index-filters-1.0.1.jar jackrabbit-jcr-commons-1.0.1.jar jcr-1.0.jar tm-extractors-0.4.jar lucene-1.4.3.jar Reporter: Christian Zanata Assigned To: Marcel Reutegger Attachments: log_files.zip [ERROR] 20060825 17:06:40 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError when we try to re-index a repository, the repository is quite big (more t= hen 4 Gb of disk usage) and sometimes it stores 40Mb size documents. As attach I put all the last logs we registered, with the full stack trac= es. Related to this whe have also errors with Lucene: [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader) - Dump:=20 java.io.IOException: Invalid header signature; read 8656037701166316554, expected -2226271756974174256 at org.apache.jackrabbit.core.query.MsWordTextFilter and then this ones: [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) - removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) - Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was not shut down properly. [ERROR] 20060803 09:33:14 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.NullPointerException: null values not allowed this is our repository.xml configuration for indexing SearchIndex class=3Dorg.apache.jackrabbit.core.query.lucene.SearchIndex param name=3Dpath value=3D${wsp.home}/index/ param name=3DtextFilterClasses value=3Dorg.apache.jackrabbit.core.query.lucene.TextPlainTextFilter, org.apache.jackrabbit.core.query.MsExcelTextFilter, org.apache.jackrabbit.core.query.MsPowerPointTextFilter,=20 org.apache.jackrabbit.core.query.MsWordTextFilter, org.apache.jackrabbit.core.query.PdfTextFilter, org.apache.jackrabbit.core.query.HTMLTextFilter, org.apache.jackrabbit.core.query.XMLTextFilter, org.apache.jackrabbit.core.query.RTFTextFilter, org.apache.jackrabbit.core.query.OpenOfficeTextFi= lter/ param name=3DuseCompoundFile value=3Dtrue/ param name=3DminMergeDocs value=3D100/ param name=3DvolatileIdleTime value=3D3/ param name=3DmaxMergeDocs value=3D10/ param name=3DmergeFactor value=3D10/ param name=3DbufferSize value=3D10/ param name=3DcacheSize value=3D1000/ param name=3DforceConsistencyCheck value=3Dfalse/ param name=3DautoRepair value=3Dtrue/ param name=3DrespectDocumentOrder value=3Dfalse/ param name=3Danalyzer value=3Dorg.apache.lucene.analysis.standard.StandardAnalyzer/ /SearchIndex --=20 This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: htt= p://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-550) ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository In-Reply-To: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable [ http://issues.apache.org/jira/browse/JCR-550?page=3Dcomments#action_1= 2432434 ]=20 =20 Claus K=C3=B6ll commented on JCR-550: I tried to re-index my repsoitory without the text filters and it works fin= e. So the bug is in one of the text filters ... These text filters i used before=20 org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter org.apache.jackrabbit.core.query.MsExcelTextFilter org.apache.jackrabbit.core.query.MsPowerPointTextFilter org.apache.jackrabbit.core.query.MsWordTextFilter org.apache.jackrabbit.core.query.PdfTextFilter org.apache.jackrabbit.core.query.HTMLTextFilter org.apache.jackrabbit.core.query.XMLTextFilter org.apache.jackrabbit.core.query.RTFTextFilter org.apache.jackrabbit.core.query.OpenOfficeTextFilter So i can test to re-index the repository without some filters ... Please g= ive me a hint wich one i should use ??? ObservationManagerFactory) - OutOfMemoryError when re-indexing the repository -= - Key: JCR-550 URL: http://issues.apache.org/jira/browse/JCR-550 Project: Jackrabbit Issue Type: Bug Components: indexing Affects Versions: 1.0.1 Environment: tomcat 5.0 [256 up to 512 mb of ram]=20 jackrabbit 1.0.1=20 jdk 1.4.2_12=20 Intel Xeon 3.2GHz with 2Gb of memory poi-3.0-alpha2-20060616.jar poi-contrib-3.0-alpha2-20060616.jar poi-scratchpad-3.0-alpha2-20060616.jar jackrabbit-core-1.0.1.jar jackrabbit-index-filters-1.0.1.jar jackrabbit-jcr-commons-1.0.1.jar jcr-1.0.jar tm-extractors-0.4.jar lucene-1.4.3.jar Reporter: Christian Zanata Assigned To: Marcel Reutegger Attachments: log_files.zip [ERROR] 20060825 17:06:40 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError when we try to re-index a repository, the repository is quite big (more t= hen 4 Gb of disk usage) and sometimes it stores 40Mb size documents. As attach I put all the last logs we registered, with the full stack trac= es. Related to this whe have also errors with Lucene: [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader) - Dump:=20 java.io.IOException: Invalid header signature; read 8656037701166316554, expected -2226271756974174256 at org.apache.jackrabbit.core.query.MsWordTextFilter and then this ones: [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) - removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) - Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was not shut down properly. [ERROR] 20060803 09:33:14 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.NullPointerException: null values not allowed this is our repository.xml configuration for indexing SearchIndex class=3Dorg.apache.jackrabbit.core.query.lucene.SearchIndex param name=3Dpath value=3D${wsp.home}/index/ param name=3DtextFilterClasses value=3Dorg.apache.jackrabbit.core.query.lucene.TextPlainTextFilter, org.apache.jackrabbit.core.query.MsExcelTextFilter, org.apache.jackrabbit.core.query.MsPowerPointTextFilter,=20 org.apache.jackrabbit.core.query.MsWordTextFilter, org.apache.jackrabbit.core.query.PdfTextFilter, org.apache.jackrabbit.core.query.HTMLTextFilter, org.apache.jackrabbit.core.query.XMLTextFilter, org.apache.jackrabbit.core.query.RTFTextFilter, org.apache.jackrabbit.core.query.OpenOfficeTextFi= lter/ param name=3DuseCompoundFile value=3Dtrue/ param name=3DminMergeDocs value=3D100/ param name=3DvolatileIdleTime value=3D3/ param name=3DmaxMergeDocs value=3D10/ param name=3DmergeFactor value=3D10/ param name=3DbufferSize value=3D10/ param name=3DcacheSize value=3D1000/ param name=3DforceConsistencyCheck value=3Dfalse/ param name=3DautoRepair value=3Dtrue/ param name=3DrespectDocumentOrder value=3Dfalse/ param name=3Danalyzer value=3Dorg.apache.lucene.analysis.standard.StandardAnalyzer/ /SearchIndex --=20 This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: htt= p://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-550) ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository In-Reply-To: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit [ http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12432089 ] Christian Zanata commented on JCR-550: -- Hi Marcel, we think the problem is in PdfTextFilter or in PdfBox libraries. We are not sure about that and we still investigate in that direction. It seems that after an exception something don't free the resources. ObservationManagerFactory) - OutOfMemoryError when re-indexing the repository -- Key: JCR-550 URL: http://issues.apache.org/jira/browse/JCR-550 Project: Jackrabbit Issue Type: Bug Components: indexing Affects Versions: 1.0.1 Environment: tomcat 5.0 [256 up to 512 mb of ram] jackrabbit 1.0.1 jdk 1.4.2_12 Intel Xeon 3.2GHz with 2Gb of memory poi-3.0-alpha2-20060616.jar poi-contrib-3.0-alpha2-20060616.jar poi-scratchpad-3.0-alpha2-20060616.jar jackrabbit-core-1.0.1.jar jackrabbit-index-filters-1.0.1.jar jackrabbit-jcr-commons-1.0.1.jar jcr-1.0.jar tm-extractors-0.4.jar lucene-1.4.3.jar Reporter: Christian Zanata Assigned To: Marcel Reutegger Attachments: log_files.zip [ERROR] 20060825 17:06:40 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError when we try to re-index a repository, the repository is quite big (more then 4 Gb of disk usage) and sometimes it stores 40Mb size documents. As attach I put all the last logs we registered, with the full stack traces. Related to this whe have also errors with Lucene: [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader) - Dump: java.io.IOException: Invalid header signature; read 8656037701166316554, expected -2226271756974174256 at org.apache.jackrabbit.core.query.MsWordTextFilter and then this ones: [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) - removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) - Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was not shut down properly. [ERROR] 20060803 09:33:14 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.NullPointerException: null values not allowed this is our repository.xml configuration for indexing SearchIndex class=org.apache.jackrabbit.core.query.lucene.SearchIndex param name=path value=${wsp.home}/index/ param name=textFilterClasses value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter, org.apache.jackrabbit.core.query.MsExcelTextFilter, org.apache.jackrabbit.core.query.MsPowerPointTextFilter, org.apache.jackrabbit.core.query.MsWordTextFilter, org.apache.jackrabbit.core.query.PdfTextFilter, org.apache.jackrabbit.core.query.HTMLTextFilter, org.apache.jackrabbit.core.query.XMLTextFilter, org.apache.jackrabbit.core.query.RTFTextFilter, org.apache.jackrabbit.core.query.OpenOfficeTextFilter/ param name=useCompoundFile value=true/ param name=minMergeDocs value=100/ param name=volatileIdleTime value=3/ param name=maxMergeDocs value=10/ param name=mergeFactor value=10/ param name=bufferSize value=10/ param name=cacheSize value=1000/ param name=forceConsistencyCheck value=false/ param name=autoRepair value=true/ param name=respectDocumentOrder value=false/ param name=analyzer value=org.apache.lucene.analysis.standard.StandardAnalyzer/ /SearchIndex -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-550) ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository In-Reply-To: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit [ http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12432119 ] Ian Boston commented on JCR-550: 2 things with pdfbox If you dont religiously close the streams it causes oom problems and a GC wont get to the finalize fast enough to avoid OOM 2. PDFBox has to build the entire document including all the graphics images before it can render the text. If you have a refactored PDF you can get 1000's of graphics line segments, this causes PDBBox to use lots of CPU and Heap converting to a text stream. I am using PDFBox in a different search engine in the same way and it randomly causes lots of problems with refactored PDF files. HTH ObservationManagerFactory) - OutOfMemoryError when re-indexing the repository -- Key: JCR-550 URL: http://issues.apache.org/jira/browse/JCR-550 Project: Jackrabbit Issue Type: Bug Components: indexing Affects Versions: 1.0.1 Environment: tomcat 5.0 [256 up to 512 mb of ram] jackrabbit 1.0.1 jdk 1.4.2_12 Intel Xeon 3.2GHz with 2Gb of memory poi-3.0-alpha2-20060616.jar poi-contrib-3.0-alpha2-20060616.jar poi-scratchpad-3.0-alpha2-20060616.jar jackrabbit-core-1.0.1.jar jackrabbit-index-filters-1.0.1.jar jackrabbit-jcr-commons-1.0.1.jar jcr-1.0.jar tm-extractors-0.4.jar lucene-1.4.3.jar Reporter: Christian Zanata Assigned To: Marcel Reutegger Attachments: log_files.zip [ERROR] 20060825 17:06:40 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError when we try to re-index a repository, the repository is quite big (more then 4 Gb of disk usage) and sometimes it stores 40Mb size documents. As attach I put all the last logs we registered, with the full stack traces. Related to this whe have also errors with Lucene: [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader) - Dump: java.io.IOException: Invalid header signature; read 8656037701166316554, expected -2226271756974174256 at org.apache.jackrabbit.core.query.MsWordTextFilter and then this ones: [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) - removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) - Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was not shut down properly. [ERROR] 20060803 09:33:14 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.NullPointerException: null values not allowed this is our repository.xml configuration for indexing SearchIndex class=org.apache.jackrabbit.core.query.lucene.SearchIndex param name=path value=${wsp.home}/index/ param name=textFilterClasses value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter, org.apache.jackrabbit.core.query.MsExcelTextFilter, org.apache.jackrabbit.core.query.MsPowerPointTextFilter, org.apache.jackrabbit.core.query.MsWordTextFilter, org.apache.jackrabbit.core.query.PdfTextFilter, org.apache.jackrabbit.core.query.HTMLTextFilter, org.apache.jackrabbit.core.query.XMLTextFilter, org.apache.jackrabbit.core.query.RTFTextFilter, org.apache.jackrabbit.core.query.OpenOfficeTextFilter/ param name=useCompoundFile value=true/ param name=minMergeDocs value=100/ param name=volatileIdleTime value=3/ param name=maxMergeDocs value=10/ param name=mergeFactor value=10/ param name=bufferSize value=10/ param name=cacheSize value=1000/ param name=forceConsistencyCheck value=false/ param name=autoRepair value=true/ param name=respectDocumentOrder value=false/ param name=analyzer value=org.apache.lucene.analysis.standard.StandardAnalyzer/ /SearchIndex -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-550) ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository In-Reply-To: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit [ http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12431955 ] Marcel Reutegger commented on JCR-550: -- To reproduce this issue I tried to re-index a repository with 100'000 nodes. I was able to re-index the repository with as little as 32 mb heap size. My profiler did not show any exceptional memory usage in the search index. The memory usage was actually quite low. Can you please try to re-index your repository without the text filters? Maybe there is a memory leak in one of the filters when an exception is thrown on an invalid or corrupt document. Having a heap dump for analysis would also be helpful. Can you please run the re-indexing process with the following JVM option: -Xrunhprof:heap=sites,doe=n This will allow you to create a heap dump on a Ctrl-Break (on Windows) or kill -QUIT (on Unix) on the JVM process. Thanks a lot. ObservationManagerFactory) - OutOfMemoryError when re-indexing the repository -- Key: JCR-550 URL: http://issues.apache.org/jira/browse/JCR-550 Project: Jackrabbit Issue Type: Bug Components: indexing Affects Versions: 1.0.1 Environment: tomcat 5.0 [256 up to 512 mb of ram] jackrabbit 1.0.1 jdk 1.4.2_12 Intel Xeon 3.2GHz with 2Gb of memory poi-3.0-alpha2-20060616.jar poi-contrib-3.0-alpha2-20060616.jar poi-scratchpad-3.0-alpha2-20060616.jar jackrabbit-core-1.0.1.jar jackrabbit-index-filters-1.0.1.jar jackrabbit-jcr-commons-1.0.1.jar jcr-1.0.jar tm-extractors-0.4.jar lucene-1.4.3.jar Reporter: Christian Zanata Assigned To: Marcel Reutegger Attachments: log_files.zip [ERROR] 20060825 17:06:40 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError when we try to re-index a repository, the repository is quite big (more then 4 Gb of disk usage) and sometimes it stores 40Mb size documents. As attach I put all the last logs we registered, with the full stack traces. Related to this whe have also errors with Lucene: [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader) - Dump: java.io.IOException: Invalid header signature; read 8656037701166316554, expected -2226271756974174256 at org.apache.jackrabbit.core.query.MsWordTextFilter and then this ones: [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) - removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) - Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was not shut down properly. [ERROR] 20060803 09:33:14 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.NullPointerException: null values not allowed this is our repository.xml configuration for indexing SearchIndex class=org.apache.jackrabbit.core.query.lucene.SearchIndex param name=path value=${wsp.home}/index/ param name=textFilterClasses value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter, org.apache.jackrabbit.core.query.MsExcelTextFilter, org.apache.jackrabbit.core.query.MsPowerPointTextFilter, org.apache.jackrabbit.core.query.MsWordTextFilter, org.apache.jackrabbit.core.query.PdfTextFilter, org.apache.jackrabbit.core.query.HTMLTextFilter, org.apache.jackrabbit.core.query.XMLTextFilter, org.apache.jackrabbit.core.query.RTFTextFilter, org.apache.jackrabbit.core.query.OpenOfficeTextFilter/ param name=useCompoundFile value=true/ param name=minMergeDocs value=100/ param name=volatileIdleTime value=3/ param name=maxMergeDocs value=10/ param name=mergeFactor value=10/ param name=bufferSize value=10/ param name=cacheSize value=1000/ param name=forceConsistencyCheck value=false/ param name=autoRepair value=true/ param name=respectDocumentOrder value=false/ param name=analyzer value=org.apache.lucene.analysis.standard.StandardAnalyzer/ /SearchIndex -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (JCR-550) ObservationManagerFactory) -
OutOfMemoryError when re-indexing the repository In-Reply-To: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit [ http://issues.apache.org/jira/browse/JCR-550?page=comments#action_12431236 ] Marcel Reutegger commented on JCR-550: -- Your log files seem to indicate that some of your content is corrupt: Caused by: java.lang.IllegalArgumentException: invalid QName literal at org.apache.jackrabbit.name.QName.valueOf(QName.java:618) at org.apache.jackrabbit.core.state.util.Serializer.deserialize(Serializer.java:124) at org.apache.jackrabbit.core.state.obj.ObjectPersistenceManager.load(ObjectPersistenceManager.java:206) ... 61 more Please note that using the ObjectPersistenceManager on a production system is not recommended because it is not transactional. You should consider using DerbyPersistenceManager as your version storage. ObservationManagerFactory) - OutOfMemoryError when re-indexing the repository -- Key: JCR-550 URL: http://issues.apache.org/jira/browse/JCR-550 Project: Jackrabbit Issue Type: Bug Components: indexing Affects Versions: 1.0.1 Environment: tomcat 5.0 [256 up to 512 mb of ram] jackrabbit 1.0.1 jdk 1.4.2_12 Intel Xeon 3.2GHz with 2Gb of memory poi-3.0-alpha2-20060616.jar poi-contrib-3.0-alpha2-20060616.jar poi-scratchpad-3.0-alpha2-20060616.jar jackrabbit-core-1.0.1.jar jackrabbit-index-filters-1.0.1.jar jackrabbit-jcr-commons-1.0.1.jar jcr-1.0.jar tm-extractors-0.4.jar lucene-1.4.3.jar Reporter: Christian Zanata Assigned To: Marcel Reutegger Attachments: log_files.zip [ERROR] 20060825 17:06:40 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.OutOfMemoryError when we try to re-index a repository, the repository is quite big (more then 4 Gb of disk usage) and sometimes it stores 40Mb size documents. As attach I put all the last logs we registered, with the full stack traces. Related to this whe have also errors with Lucene: [DEBUG] 20060803 08:24:01 (org.apache.jackrabbit.core.query.LazyReader) - Dump: java.io.IOException: Invalid header signature; read 8656037701166316554, expected -2226271756974174256 at org.apache.jackrabbit.core.query.MsWordTextFilter and then this ones: [DEBUG] 20060803 08:37:17 (org.apache.jackrabbit.core.ItemManager) - removing item 8637bf5f-4689-4e75-888f-b7b89bef40c8 from cache [ WARN] 20060803 08:40:13 (org.apache.jackrabbit.core.RepositoryImpl) - Existing lock file at C:\Wave\Repository\.lock deteteced. Repository was not shut down properly. [ERROR] 20060803 09:33:14 (org.apache.jackrabbit.core.observation.ObservationManagerFactory) - Synchronous EventConsumer threw exception. java.lang.NullPointerException: null values not allowed this is our repository.xml configuration for indexing SearchIndex class=org.apache.jackrabbit.core.query.lucene.SearchIndex param name=path value=${wsp.home}/index/ param name=textFilterClasses value=org.apache.jackrabbit.core.query.lucene.TextPlainTextFilter, org.apache.jackrabbit.core.query.MsExcelTextFilter, org.apache.jackrabbit.core.query.MsPowerPointTextFilter, org.apache.jackrabbit.core.query.MsWordTextFilter, org.apache.jackrabbit.core.query.PdfTextFilter, org.apache.jackrabbit.core.query.HTMLTextFilter, org.apache.jackrabbit.core.query.XMLTextFilter, org.apache.jackrabbit.core.query.RTFTextFilter, org.apache.jackrabbit.core.query.OpenOfficeTextFilter/ param name=useCompoundFile value=true/ param name=minMergeDocs value=100/ param name=volatileIdleTime value=3/ param name=maxMergeDocs value=10/ param name=mergeFactor value=10/ param name=bufferSize value=10/ param name=cacheSize value=1000/ param name=forceConsistencyCheck value=false/ param name=autoRepair value=true/ param name=respectDocumentOrder value=false/ param name=analyzer value=org.apache.lucene.analysis.standard.StandardAnalyzer/ /SearchIndex -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira