We've run into a blocking problem with our use of Lucene: we get OutOfMemoryError when performing a one-term search in our index. The search, if completed, should give only a few thousand hits, but from inspecting a heap dump it appears that many more documents in the index get stored in Lucene during the search. Our index consists of eight fields per document, fairly regularly sized, the total index size is 170GB, spread over about 400 million documents (425 bytes per document). The search is a simple TermQuery, the search term a trivial string, the code in question looks like this (cut together for conciseness):
public static final String FIELD_URL = "url"; ... luceneSearcher = new IndexSearcher(indexDir.getAbsolutePath()); Query query = new TermQuery(new Term(DigestIndexer.FIELD_URL, uri)); try { Hits hits = luceneSearcher.search(query); Stack trace: Oct 11, 2007 4:02:19 PM org.slf4j.impl.JCLLoggerAdapter error SEVERE: EXCEPTION java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.SegmentReader.getNorms(SegmentReader.java:384) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:393) at org.apache.lucene.search.TermQuery $TermWeight.scorer(TermQuery.java:68) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:129) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:99) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.(Hits.java:44) at org.apache.lucene.search.Searcher.search(Searcher.java:44) at org.apache.lucene.search.Searcher.search(Searcher.java:36) at dk.netarkivet.common.distribute.arcrepository.ARCLookup.luceneLookup(ARCLookup.java:166) at dk.netarkivet.common.distribute.arcrepository.ARCLookup.lookup(ARCLookup.java:130) at dk.netarkivet.viewerproxy.ARCArchiveAccess.lookup(ARCArchiveAccess.java:126) at dk.netarkivet.viewerproxy.NotifyingURIResolver.lookup(NotifyingURIResolver.java:72) at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:80) at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:80) at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:80) at dk.netarkivet.viewerproxy.WebProxy.handle(WebProxy.java:129) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:457) at org.mortbay.jetty.HttpConnection $RequestHandler.headerComplete(HttpConnection.java:751) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:500) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:209) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:357) at org.mortbay.jetty.bio.SocketConnector $Connection.run(SocketConnector.java:217) at org.mortbay.thread.BoundedThreadPool $PoolThread.run(BoundedThreadPool.java:475) Can it be right that memory usage depends on size of the index rather than size of the result? Can something be done to reduce memory usage for such a simple but big scenario? -Lars --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]