Hi,

Thank you for your suggestions. I found the reason which is that PDFBox
seems having problem parsing large document (20MB), I have a few of them
within those 2000 docs, those are the ones throwing OutOfMemory errors. The
app does exit, and JVM died. I am running on 32bit machine.

-- Ching

On Tue, Oct 12, 2010 at 9:42 PM, Anshum <ansh...@gmail.com> wrote:

> Hi Ching,
> Does the app exit or hang and stay there? as in does the JVM stay alive and
> idle?
> Also, can you make sure that its not the pdfbox? as in, try commenting the
> indexwriter part and just read the pdfs, does that work fine.
> Can you also post info on your environment?
> Index Size? Lucene Version? Machine  and JVM (32/64 bit)?
> This most probably seems like a code level issue rather than lucene, but I
> may be wrong.
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Wed, Oct 13, 2010 at 8:08 AM, Ching <zchin...@gmail.com> wrote:
>
> > Hi All,
> >
> > Can anyone help with this issue? I have about 2000 pdf files that I use
> > PDFBox to extract its text, then index them using for loop. The indexing
> > stopped after the fdt file reaches at 7,061 KB in size. There is no
> error,
> > the indexing simply stopped.  Thanks in advance for any help.
> >
> > Ching
> >
>

Reply via email to