Not cool at all, well a way around it is to have an boolean attribute called index in the XML file that folks assign on elements that they wish to index.
If anyone else has a better solution let me know. Thanks, Rob -----Original Message----- From: Aaron Galea [mailto:[EMAIL PROTECTED]] Sent: Friday, February 14, 2003 9:28 AM To: Lucene Users List; Lucene Users List Subject: RE: OutOfMemoryException while Indexing an XML file I had this problem when using xerces to parse xml documents. The problem I think lies in the Java garbage collector. The way I solved it was to create a shell script that invokes a java program for each xml file that adds it to the index. Hope this helps... Aaron ---------- Original Message ---------------------------------- From: "Rob Outar" <[EMAIL PROTECTED]> Reply-To: "Lucene Users List" <[EMAIL PROTECTED]> Date: Fri, 14 Feb 2003 08:43:34 -0500 >Forgot to mention I am indexing 1000's of XML files. I ran a little test to >see if that file was the problem, but it was abled to be indexed after some >time and memory usage was huge. I think maybe because I index these files >one after the other something is not getting cleaned up leading to the >exception. > >Thanks, > >Rob > > >-----Original Message----- >From: Rob Outar [mailto:[EMAIL PROTECTED]] >Sent: Friday, February 14, 2003 8:25 AM >To: Lucene Users List >Subject: RE: OutOfMemoryException while Indexing an XML file > > >So to the best of your knowledge the Lucene Document Object should not cause >the exception even though the XML file is huge and 1000's of fields are >being added to the Lucene Document Object? > >Thanks, > >Rob > > >-----Original Message----- >From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] >Sent: Friday, February 14, 2003 8:21 AM >To: Lucene Users List >Subject: Re: OutOfMemoryException while Indexing an XML file > > >Nothing in the code snippet you sent would cause that exception. >If I were you I'd run it under a profiler to quickly see where the leak >is. You can even use something free like JMP. > >Otis > >--- Rob Outar <[EMAIL PROTECTED]> wrote: >> Hi all, >> >> I was using the sample code provided I believe by Doug Cutting to >> index an >> XML file, the XML file was 2 megs (kinda large) but while adding >> fields to >> the Document object I got an OutOfMemoryException exception. I work >> with >> XML files a lot, I can easily parse that 2 meg file into a DOM tree, >> I can't >> imagine a Lucene document being larger than a DOM Tree, pasted below >> is the >> SAX handler. >> >> public class XMLDocumentBuilder >> extends DefaultHandler { >> >> /** A buffer for each XML element */ >> private StringBuffer elementBuffer = new StringBuffer(); >> >> private Document mDocument; >> >> >> public void buildDocument(Document doc, String xmlFile) throws >> IOException, >> SAXException { >> >> this.mDocument = doc; >> SAXReader.parse(xmlFile, this); >> } >> >> public void startElement(String uri, String localName, String >> qName, >> Attributes atts) { >> >> elementBuffer.setLength(0); >> >> if (atts != null) { >> >> for (int i = 0; i < atts.getLength(); i++) { >> >> String attname = atts.getLocalName(i); >> mDocument.add(new Field(attname, atts.getValue(i), >> true, true, true)); >> } >> } >> } >> >> // call when cdata found >> public void characters(char[] text, int start, int length) { >> elementBuffer.append(text, start, length); >> } >> >> public void endElement(String uri, String localName, String >> qName) { >> mDocument.add(Field.Text(localName, >> elementBuffer.toString())); >> } >> public Document getDocument() { >> return mDocument; >> } >> } >> >> Any help would be appreciated. >> >> Thanks, >> >> Rob >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > >__________________________________________________ >Do you Yahoo!? >Yahoo! Shopping - Send Flowers for Valentine's Day >http://shopping.yahoo.com > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > >--- >[This E-mail was scanned for spam and viruses by NextGen.net.] > > ________________________________________________________________ Sent through the WebMail system at nextgen.net.mt --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]