Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

Lebiram Thu, 02 Apr 2009 02:04:44 -0700

Hi Erick, 

I did a search just as JVM started... so I'm thinking that the JVM is busy with 
some startup stuff... and that this search required more memory than what is 
available at that time.

Had I done this search a while after the JVM has started, then this query 
succeeds. 
I then pump in several similar queries running on a different thread and it 
takes a long time but still runs to completion until one of them generates 
OOM.But still, queries like this is just using too much memory.

As for clauses, the BooleanQuery was set to max clause of... 9,000,000
I'm guessing that might have caused the usage of too much memory?

I'll try the explain on you've suggested. 

Thanks, 

M

________________________________
From: Erick Erickson <[email protected]>
To: [email protected]
Sent: Wednesday, April 1, 2009 6:51:13 PM
Subject: Re: Search using MultiSearcher generates OOM on a 1GB total  
Partitioned indeces

Think about putting this query in Luke and doing an "explain" for details,
but....

I'm surprised this is working at all without throwing TooManyClauses errors.
Under the covers, Lucene expands your wildcards to all terms in the field
that match. For instance, assume your document field has the following:
aa
ab
ac
ad
ae

Now, searching for a* produces a clause like:
(aa OR ab OR ac OR ad OR ae) in place of the a*

So your query is generating ginormous OR clauses, one that
contains every term in your content field starting with 'g'. Another
with every term in your content field starting with 'h' etc. So I suspect
that your content field doesn't have very many distinct terms in it....

As for why it's returning few entries, what does this part of your
query return by itself? Since it's anded with your wildcard query,
it might be what's limiting your results.

((+sender:cpuser9 +viewers:cpuser4) (+sender:cpuser4 +viewers:cpuser9)
(+viewers:cpuser9 +viewers:cpuser4))

But I'm puzzled, because saying that you're getting OOM errors
doesn't square very well with getting *any* results at all, so is
there something else going on?

Best
er...@morequestionsthananswers.

On Wed, Apr 1, 2009 at 1:31 PM, Lebiram <[email protected]> wrote:

> Hi All,
>
> I have the following query on a 1GB index with about 12 million docs :
> As you can see the terms consist of wildcards...
>
> query.toString()=+(+content:g* +content:h* +content:d* +content:s*
> +content:a* +content:w* +content:b* +content:c* +content:m* +content:e*)
> +((+sender:cpuser9 +viewers:cpuser4) (+sender:cpuser4 +viewers:cpuser9)
> (+viewers:cpuser9 +viewers:cpuser4))
>
> The Searcher is a MultiSearcher with 4 IndexSearchers pointing to 4
> different Lucene Index.
> This search returns very few records, several ten thousand hits.
>
> Java is assigned with 1GB max memory.
>
> Somehow this search eats the entire java heap space and causes OOM.
> Looking at jProfiler, i see org.apache.lucene package generating a lot of
> objects which I believe is taking all this space.
>
> Can anyone explain the reason why this particular search might take so much
> memory?
> Is there anything I am doing wrong here?
> More importantly, is there anything I can do to improve this?
>
> -M
>
>
>

Re: Search using MultiSearcher generates OOM on a 1GB total Partitioned indeces

Reply via email to