karthik i think the core problem in your case is the use of compound files, i would be best to switch it off or alternatively issue a optimize as soon as the indexing is over.
i am copying the file contents between <file> tags, the patch is to be applied on TermInfosReader.java, this was done to help out of memory exceptions while doing indexing <file> Index: src/java/org/apache/lucene/index/TermInfosReader.java =================================================================== RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.ja va,v retrieving revision 1.9 diff -u -r1.9 TermInfosReader.java --- src/java/org/apache/lucene/index/TermInfosReader.java 6 Aug 2004 20:50:29 -0000 1.9 +++ src/java/org/apache/lucene/index/TermInfosReader.java 10 Sep 2004 17:46:47 -0000 @@ -45,6 +45,11 @@ readIndex(); } + protected final void finalize() { + // patch for pre-1.4.2 JVMs, whose ThreadLocals leak + enumerators.set(null); + } + public int getSkipInterval() { return origEnum.skipInterval; } </file> however tomcat does react in strange ways to to-many open files, try to restrict the number of IndexReader or Searchable objects that you create while doing searches, I usually keep one object to handle all my user requests public static Searcher fetchCitationSearcher(HttpServletRequest request) throws Exception { Searcher rval = (Searcher) request.getSession().getServletContext().getAttribute( "luceneSearchable"); if (rval == null) { rval = new IndexSearcher( fetchCitationReader(request) ); request.getSession().getServletContext().setAttribute("luceneSearchable", rval); } return rval; } >-----Original Message----- >From: Karthik N S [mailto:[EMAIL PROTECTED] >Sent: 10 November 2004 11:41 >To: Lucene Users List >Subject: RE: Lucene1.4.1 + OutOf Memory > > >Hi > > Rupinder Singh Mazara > >Apologies............ > > > > Can u Past the code on to the Mail instead of Attachement... > > [ Cause I am not bale to get the Attachement on the Company's mail ] > > > Thx in advance >Karthik > > >-----Original Message----- >From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED] >Sent: Wednesday, November 10, 2004 3:10 PM >To: Lucene Users List >Subject: RE: Lucene1.4.1 + OutOf Memory > > >hi all > > I had a similar problem with jdk1.4.1, Doug had sent me a patch which I am >attaching following is the mail from Doug > > It sounds like the ThreadLocal in TermInfosReader is not getting >correctly garbage collected when the TermInfosReader is collected. >Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess is >that you're running in an older JVM. Is that right? > >I've attached a patch which should fix this. Please tell me if it works >for you. > >Doug > >Daniel Taurat wrote: >> Okay, that (1.4rc3)worked fine, too! >> Got only 257 SegmentTermEnums for 1900 objects. >> >> Now I will go for the final test on the production server with the >> 1.4rc3 version and about 40.000 objects. >> >> Daniel >> >> Daniel Taurat schrieb: >> >>> Hi all, >>> here is some update for you: >>> I switched back to Lucene 1.3-final and now the number of the >>> SegmentTermEnum objects is controlled by gc again: >>> it goes up to about 1000 and then it is down again to 254 after >>> indexing my 1900 test-objects. >>> Stay tuned, I will try 1.4RC3 now, the last version before FieldCache >>> was introduced... >>> >>> Daniel >>> >>> >>> Rupinder Singh Mazara schrieb: >>> >>>> hi all >>>> I had a similar problem, i have database of documents with 24 >>>> fields, and a average content of 7K, with 16M+ records >>>> >>>> i had to split the jobs into slabs of 1M each and merging the >>>> resulting indexes, submissions to our job queue looked like >>>> >>>> java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22 >>>> >>>> and i still had outofmemory exception , the solution that i created >>>> was to after every 200K, documents create a temp directory, and merge >>>> them together, this was done to do the first production run, updates >>>> are now being handled incrementally >>>> >>>> >>>> >>>> Exception in thread "main" java.lang.OutOfMemoryError >>>> at >>>> >org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream >.java(Com >piled >>>> Code)) >>>> at >>>> org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined >>>> Compiled Code)) >>>> at >>>> >org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined >>>> Compiled Code)) >>>> at >>>> >org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled >>>> Code)) >>>> at >>>> >org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWri >ter.java( >Compiled >>>> Code)) >>>> at >>>> >org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter >.java(Com >piled >>>> Code)) >>>> at >>>> >org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMer >ger.java( >Compiled >>>> Code)) >>>> at >>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled >>>> Code)) >>>> at >>>> >org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled >>>> Code)) >>>> at >>>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366) >>>> at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code)) >>>> at lucene.Indexer.main(CDBIndexer.java:168) >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Daniel Taurat [mailto:[EMAIL PROTECTED] >>>>> Sent: 10 September 2004 14:42 >>>>> To: Lucene Users List >>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large >>>>> number >>>>> of documents >>>>> >>>>> >>>>> Hi Pete, >>>>> good hint, but we actually do have physical memory of 4Gb on the >>>>> system. But then: we also have experienced that the gc of ibm >>>>> jdk1.3.1 that we use is sometimes >>>>> behaving strangely with too large heap space anyway. (Limit seems to >>>>> be 1.2 Gb) >>>>> I can say that gc is not collecting these objects since I forced gc >>>>> runs when indexing every now and then (when parsing pdf-type >>>>> objects, that is): No effect. >>>>> >>>>> regards, >>>>> >>>>> Daniel >>>>> >>>>> >>>>> Pete Lewis wrote: >>>>> >>>>> >>>>> >>>>>> Hi all >>>>>> >>>>>> Reading the thread with interest, there is another way I've come >>>>> >>>>> >>>>> across out >>>>> >>>>> >>>>>> of memory errors when indexing large batches of documents. >>>>>> >>>>>> If you have your heap space settings too high, then you get >>>>> >>>>> >>>>> swapping (which >>>>> >>>>> >>>>>> impacts performance) plus you never reach the trigger for garbage >>>>>> collection, hence you don't garbage collect and hence you run out >>>>> >>>>> >>>>> of memory. >>>>> >>>>> >>>>>> Can you check whether or not your garbage collection is being >>>>>> triggered? >>>>>> >>>>>> Anomalously therefore if this is the case, by reducing the heap >>>>>> space you >>>>>> can improve performance get rid of the out of memory errors. >>>>>> >>>>>> Cheers >>>>>> Pete Lewis >>>>>> >>>>>> ----- Original Message ----- From: "Daniel Taurat" >>>>>> <[EMAIL PROTECTED]> >>>>>> To: "Lucene Users List" <[EMAIL PROTECTED]> >>>>>> Sent: Friday, September 10, 2004 1:10 PM >>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large >>>>> >>>>> >>>>> number of >>>>> >>>>> >>>>>> documents >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Daniel Aber schrieb: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Thursday 09 September 2004 19:47, Daniel Taurat wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I am facing an out of memory problem using Lucene 1.4.1. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Could you try with a recent CVS version? There has been a fix >>>>>>>> >>>>>>> >>>>>>> >>>>> about files >>>>> >>>>> >>>>>>>> not being deleted after 1.4.1. Not sure if that could cause the >>>>>>>> problems >>>>>>>> you're experiencing. >>>>>>>> >>>>>>>> Regards >>>>>>>> Daniel >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> Well, it seems not to be files, it looks more like those >>>>>>> SegmentTermEnum >>>>>>> objects accumulating in memory. >>>>>>> #I've seen some discussion on these objects in the >>>>>>> developer-newsgroup >>>>>>> that had taken place some time ago. >>>>>>> I am afraid this is some kind of runaway caching I have to >deal with. >>>>>>> Maybe not correctly addressed in this newsgroup, after all... >>>>>>> >>>>>>> Anyway: any idea if there is an API command to re-init caches? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Daniel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >--------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>> >>>> >>>> >>>> >>> >>> >> >> > >>-----Original Message----- >>From: Erik Hatcher [mailto:[EMAIL PROTECTED] >>Sent: 10 November 2004 09:35 >>To: Lucene Users List >>Subject: Re: Lucene1.4.1 + OutOf Memory >> >> >>On Nov 10, 2004, at 1:55 AM, Karthik N S wrote: >>> >>> Hi >>> Guys >>> >>> Apologies.......... >> >>No need to apologize for asking questions. >> >>> History >>> >>> Ist type : 40000 subindexes + MultiSearcher + Search on Content >>> Field >> >>You've got 40,000 indexes aggregated under a MultiSearcher and you're >>wondering why you're running out of memory?! :O >> >>> Exception [ Too many Files Open ] >> >>Are you using the compound file format? >> >> Erik >> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: [EMAIL PROTECTED] >>For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]