Hi, Thank you for your suggestions. I found the reason which is that PDFBox seems having problem parsing large document (20MB), I have a few of them within those 2000 docs, those are the ones throwing OutOfMemory errors. The app does exit, and JVM died. I am running on 32bit machine.
-- Ching On Tue, Oct 12, 2010 at 9:42 PM, Anshum <ansh...@gmail.com> wrote: > Hi Ching, > Does the app exit or hang and stay there? as in does the JVM stay alive and > idle? > Also, can you make sure that its not the pdfbox? as in, try commenting the > indexwriter part and just read the pdfs, does that work fine. > Can you also post info on your environment? > Index Size? Lucene Version? Machine and JVM (32/64 bit)? > This most probably seems like a code level issue rather than lucene, but I > may be wrong. > > -- > Anshum Gupta > http://ai-cafe.blogspot.com > > > On Wed, Oct 13, 2010 at 8:08 AM, Ching <zchin...@gmail.com> wrote: > > > Hi All, > > > > Can anyone help with this issue? I have about 2000 pdf files that I use > > PDFBox to extract its text, then index them using for loop. The indexing > > stopped after the fdt file reaches at 7,061 KB in size. There is no > error, > > the indexing simply stopped. Thanks in advance for any help. > > > > Ching > > >