Re: [jira] Resolved: (LUCENE-1053) OutOfMemoryError on search in large, simple index

robert engels Tue, 13 Nov 2007 05:07:07 -0800

The user list is the appropriate spot, but this brings up adiscussion point.

The "norms" will require 1 byte per document, so you will need atleast 512 M for the heap.


Start the java process with -Xmx512m and see what happens.

Depending on what you are doing you might be able to "omit thenorms", but this really doesn't save any memory, BUT...

Maybe Lucene should be changed to not create the 'fake norms' array,and instead if if the norms() returned null, then don't dereferencethe norm value, but use


DefaultSimilarity.encodeNorm(1.0f)

in 'real time'. (This is what our branch does). The memory savingsis huge for a large index, and many Lucene applications do not needthe norms (thus the 'omit norms' option).

This would require changes to all of the calls to norms() - about 25instances, and some scorer code (since it derefernces the normsdirectly).


The simplest solution would be to change

byte[] norms()

to

int norm(int doc)
{
        final int defaultnorm = DefaultSimilarity.encodeNorm(1.0f);

if(norms==null)
        return defaultnorm;
else
        return norms[doc]

}

The trivial method will be inlined, so the performance hit would benegligible. Then all users of the norms array would be changed to norm(doc).


On Nov 13, 2007, at 6:39 AM, Grant Ingersoll (JIRA) wrote:

[ https://issues.apache.org/jira/browse/LUCENE-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Ingersoll resolved LUCENE-1053.
-------------------------------------

       Resolution: Invalid
    Lucene Fields:   (was: [New])

Hi Lars,
Generally we recommend you open discussion of issues you are havingwith your applications use of Lucene by asking questions on thejava-user mailing list. What you are reporting doesn't necessarilysound like a bug in Lucene, so let's discuss it on java-user firstand hopefully we can get you some answers there first.
Start by posting what you have here, plus add in what your heapsettings are, etc. Lucene doesn't scale infinitely (nor does anysearch application or, for that matter, program), when you reach acertain index size, you will have to start doing things likedistributed search whereby you split your index across 2 or moremachines. You _MAY_ have hit those limits and may need todistribute your search.
Cheers,
Grant
OutOfMemoryError on search in large, simple index
-------------------------------------------------

                Key: LUCENE-1053
URL: https://issues.apache.org/jira/browse/LUCENE-1053
            Project: Lucene - Java
         Issue Type: Bug
         Components: Search
   Affects Versions: 2.0.0
Environment: Red Hat Enterprise Linux ES release 3 (TaroonUpdate 9)Linux sb-test-acs-001 2.4.21-47.0.1.ELsmp #1 SMP Fri Oct 1317:56:20 EDT 2006 i686 i686 i386 GNU/Linux
2 GB RAM
Java version 1.5.0_13
           Reporter: Lars Clausen
We get OutOfMemoryError when performing a one-term search in ourindex. The search, if completed, should give only a few thousandhits, but from inspecting a heap dump it appears that many moredocuments in the index get stored in Lucene during the search. Ourindex consists of eight fields per document, fairly regularlysized, the total index size is 170GB, spread over about 400million documents (425 bytes per document). The search is asimple TermQuery, the search term a trivial string, the code inquestion looks like this (cut together for conciseness):
        public static final String FIELD_URL = "url";
...
luceneSearcher = new IndexSearcher(indexDir.getAbsolutePath());Query query = new TermQuery(new Term(DigestIndexer.FIELD_URL, uri));
        try {
            Hits hits = luceneSearcher.search(query);
Stack trace:
Oct 11, 2007 4:02:19 PM org.slf4j.impl.JCLLoggerAdapter error
SEVERE: EXCEPTION
java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.SegmentReader.getNorms(SegmentReader.java:384)at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:393)at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:68)at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:129)at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:99)
        at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
        at org.apache.lucene.search.Hits.(Hits.java:44)
        at org.apache.lucene.search.Searcher.search(Searcher.java:44)
        at org.apache.lucene.search.Searcher.search(Searcher.java:36)
atdk.netarkivet.common.distribute.arcrepository.ARCLookup.luceneLookup(ARCLookup.java:166)atdk.netarkivet.common.distribute.arcrepository.ARCLookup.lookup(ARCLookup.java:130)at dk.netarkivet.viewerproxy.ARCArchiveAccess.lookup(ARCArchiveAccess.java:126)at dk.netarkivet.viewerproxy.NotifyingURIResolver.lookup(NotifyingURIResolver.java:72)at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:80)at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:80)at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:80)at dk.netarkivet.viewerproxy.WebProxy.handle(WebProxy.java:129)at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
        at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:457)at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:751)at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:500)at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:209)at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:357)at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:217)at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:475)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Resolved: (LUCENE-1053) OutOfMemoryError on search in large, simple index

Reply via email to