Re: indexing 100GB of data

Shai Erera Thu, 23 Jul 2009 01:27:59 -0700

Generally you shouldn't hit OOM. But it may change depending on how you use
the index. For example, if you have millions of documents spread across the
100 GB, and you use sorting for various fields, then it will consume lots of
RAM. Also, if you run hundreds of queries in parallel, each with a dozen
terms, it will also consume some considerable amount of RAM.

But if you don't do anything extreme w/ it, and you can allocate enough heap
size, then you should be ok.

The way I make such decisions is I design a test which mimics the
typical/common scenario I expect to face, and then I run it on a machine I
believe will be used in production (or as close as I can get), and analyze
the results.

If you choose to do that, and you're not satisfied w/ the results, you're
welcome to post back w/ the machine statistics and exact use case, and I
believe there are plenty of folks here who'd be willing to help you optimize
the usage of Lucene by your app. Or at least then we'll be able to tell you:
"for this index and this machine, you cannot run a 100GB index".

Shai

On Thu, Jul 23, 2009 at 10:42 AM, m.harig <m.ha...@gmail.com> wrote:

>
> Thanks all ,
>
>               Very thankful to all , am tired of hadoop settings , is it
> good to use read such type large index with lucene alone? will it go for
> OOM
> ? anyone pl suggest me.
> --
> View this message in context:
> http://www.nabble.com/indexing-100GB-of-data-tp24600563p24620846.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: indexing 100GB of data

Reply via email to