Re[2]: Out of memory exception for big indexes

Artem Sun, 08 Apr 2007 10:33:40 -0700

Hello Nilesh,

Sunday, April 8, 2007, 9:03:06 AM, you wrote:

NB> This seems like a very useful patch. Our application searches over 50
NB> million doc in a 40GB index. We only have simple conjunctive queries
NB> on a single field. Currently, the command line search program that
NB> prints top-10 results requires at least 200mb memory. Our web
NB> application, that searches the same index crashes with OOM when there
NB> are more than 10-12 concurrent requests (heap size set to 3GB). Will
NB> this patch help in such a situation?

I must note that my patch only helps in lucene-OOM situations related to
_sorted_ queries. If this is your case than I think yes it will help.

In my app currently index is not so big, only 1mln docs. With the patch applied
sample query giving first 30 of 120,000 sorted results made memory consumption
jump from 18M to 20M according to jconsole.

NB> It seems that there are some issues with this patch and that was the
NB> reason it is not yet in the main source tree. Can someone please
NB> summerize what are the downsides of using such an approach. It will be
NB> really good if Lucene had it in main source tree and a flag to turn ON
NB> or OFF this feature.

First there's performance cost (for second and further queries with the
same IndexSearcher). In default implementation all the index values of sorted
field are cached during the first sorted search - this takes memory and time;
but next queries run fast if there still some memory left. My implementation
doesn't cache field values but loads them from respective documents on the fly -
so it's slower but takes less memory. The query mentioned took about 3s (with
rather small sorted fields values - about 20-100 chars).
There's a limitation also - my implementation requires sorted field to be
"stored" in index (Field.Store.YES in doc.add())

NB> Bublic, can you tell me what exactly I need to do if I want to use this 
patch?

You can include StoredFieldSortFactory class source file into your sources and
then use StoredFieldSortFactory.create(sortFieldName, sortDescending) to get
Sort object for sorting query.
StoredFieldSortFactory source file can be extracted from LUCENE-769 patch or
from sharehound sources: 
http://sharehound.cvs.sourceforge.net/*checkout*/sharehound/jNetCrawler/src/java/org/apache/lucene/search/StoredFieldSortFactory.java

Regards,
Artem

NB> thanks
NB> Nilesh

NB> On 4/6/07, Bublic Online <[EMAIL PROTECTED]> wrote:
>> Hi Ivan, Chris and all!
>>
>> I'm that contributor of LUCENE-769 and I recommend it too :)
>> OutOfMemory error was one of main reasons for me to make it.
>>
>> Regards,
>> Artem Vasiliev
>>
>> On 4/6/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>> >
>> >
>> > : The problem I suspect is the sorting. As I understand, Lucene
>> > : builds internal caches for sorting and I suspect that this is the root
>> > : of your problem. You can test this by trying your problem queries
>> > : without sorting.
>> >
>> > if Sorting really is the cause of your problems, you may want to try out
>> > this patch...
>> >
>> > https://issues.apache.org/jira/browse/LUCENE-769
>> >
>> > ...it *may* be advantageous in situations where memory is your most
>> > constrained resource, and you are willing to sacrifice speed for sorting
>> > ... it looks promising to me, but there haven't been any convincing
>> > usecases/benchmarks of people finding it beneficial (other then the
>> > original contributor)
>> >
>> > if you do try it, please post your comments in the issue.
>> >
>> >
>> >
>> > -Hoss
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [EMAIL PROTECTED]
>> > For additional commands, e-mail: [EMAIL PROTECTED]
>> >
>> >
>>

-- 
Best regards,
 Artem                            mailto:[EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re[2]: Out of memory exception for big indexes

Reply via email to