I am having similar problem but indexing pdf documents using pdfbox parser (available at www.pdfbox.com). I get an exception saying "Exception in thread "main" java.lang.OutOfMemoryError" Any body who has implemented the above code? Any help appreciated??? Thanks! PI Rob Outar <[EMAIL PROTECTED]> wrote:We are aware of DOM limitations/memory problems, but I am using SAX to parse the file and index elements and attributes in my content handler.
Thanks, Rob -----Original Message----- From: Tatu Saloranta [mailto:[EMAIL PROTECTED]] Sent: Friday, February 14, 2003 8:18 PM To: Lucene Users List Subject: Re: OutOfMemoryException while Indexing an XML file On Friday 14 February 2003 07:27, Aaron Galea wrote: > I had this problem when using xerces to parse xml documents. The problem I > think lies in the Java garbage collector. The way I solved it was to create It's unlikely that GC is the culprit. Current ones are good at purging objects that are unreachable, and only throw OutOfMem exception when they really have no other choice. Usually it's the app that has some dangling references to objects that prevent GC from collecting objects not useful any more. However, it's good to note that Xerces (and DOM parsers in general) generally use more memory than the input XML files they process; this because they usually have to keep the whole document struct in memory, and there is overhead on top of text segments. So it's likely to be at least 2 * input file size (files usually use UTF-8 which most of the time uses 1 byte per char; in memory 16-bit unicode-2 chars are used for performance), plus some additional overhead for storing element structure information and all that. And since default max. java heap size is 64 megs, big XML files can cause problems. More likely however is that references to already processed DOM trees are not nulled in a loop or something like that? Especially if doing one JVM process for item solves the problem. > a shell script that invokes a java program for each xml file that adds it > to the index. -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------- Do you Yahoo!? Yahoo! Shopping - Send Flowers for Valentine's Day