The worst case RAM usage for Lucene is a single doc with many unique terms. Lucene allocates ~60 bytes per unique term (plus space to hold that term's characters = 2 bytes per char). And, Lucene cannot flush within one document -- it must flush after the doc has been fully indexed.
This past thread (also from Paul) delves into some of the details: http://lucene.markmail.org/thread/pbeidtepentm6mdn But it's not clear whether that is the issue affecting Ajay -- I think more details about the docs, or, some code fragments, could help shed light. Mike On Tue, Mar 2, 2010 at 8:47 AM, Murdoch, Paul <paul.b.murd...@saic.com> wrote: > Ajay, > > Here is another thread I started on the same issue. > > http://stackoverflow.com/questions/1362460/why-does-lucene-cause-oom-whe > n-indexing-large-files > > Paul > > > -----Original Message----- > From: java-user-return-45254-paul.b.murdoch=saic....@lucene.apache.org > [mailto:java-user-return-45254-paul.b.murdoch=saic....@lucene.apache.org > ] On Behalf Of ajay_gupta > Sent: Tuesday, March 02, 2010 8:28 AM > To: java-user@lucene.apache.org > Subject: Lucene Indexing out of memory > > > Hi, > It might be general question though but I couldn't find the answer yet. > I > have around 90k documents sizing around 350 MB. Each document contains a > record which has some text content. For each word in this text I want to > store context for that word and index it so I am reading each document > and > for each word in that document I am appending fixed number of > surrounding > words. To do that first I search in existing indices if this word > already > exist and if it is then I get the content and append the new context and > update the document. In case no context exist I create a document with > fields "word" and "context" and add these two fields with values as word > value and context value. > > I tried this in RAM but after certain no of docs it gave out of memory > error > so I thought to use FSDirectory method but surprisingly after 70k > documents > it also gave OOM error. I have enough disk space but still I am getting > this > error.I am not sure even for disk based indexing why its giving this > error. > I thought disk based indexing will be slow but atleast it will be > scalable. > Could someone suggest what could be the issue ? > > Thanks > Ajay > -- > View this message in context: > http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872. > html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org