I finally got back to doing my project. HitCollector solved my problem. Thank you for all the help.
On 5/14/06, Beady Geraghty <[EMAIL PROTECTED]> wrote:
Thank you for the links. I will go through them, and hopefully solve my problem. On 5/14/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > please review the advice in these archived messages, I think you'll find > them very applicable to your problem... > > > http://www.nabble.com/eliminating-scoring-for-the-sake-of-efficiency-t1603827.html#a4351614 > > http://www.nabble.com/Exact-date-search-doesn%27t-work-with-1.9.1--t1418643.html#a3833741 > > > > : Date: Sun, 14 May 2006 15:34:08 -0400 > : From: Beady Geraghty <[EMAIL PROTECTED] > > : Reply-To: java-user@lucene.apache.org > : To: java-user@lucene.apache.org > : Subject: Re: out-of-memory when searching, paging does not work. > : > : Here is the gist of the code: > : > : Query query = new TermQuery( new Term("contents", q.toLowerCase > ())); > : > : > : long start = new Date().getTime(); > : Hits hits = is.search(query); > : long end = new Date().getTime(); > : > : System.err.println("Found " + hits.length() + > : " document(s) (in " + (end - start) + > : " milliseconds) that matched query '" + > : q + "'"); > : > : > : int ct = hits.length() ; > : int ct2 = 400000; > : int step = 10000; > : int startct; > : while (ct2 < ct ) { > : startct = ct2; > : for (int i = startct; i < startct+step; i++ ) { > : if (ct2 >= ct ) { > : break; > : } > : Document doc = hits.doc(ct2); > : doc.get("filename"); > : ct2++; > : } > : System.out.println( "ct2 is " + ct2 ); > : ir.close(); > : is.close(); > : fsDir.close(); > : ir = null; > : is = null; > : fsDir = null; > : fsDir = FSDirectory.getDirectory(indexDir, false); > : ir = IndexReader.open (fsDir); > : is = new IndexSearcher(ir); > : hits = is.search(query); > : > : > : } > : > : if ct2 is set to 40,000 as oppose to 400,000 , I see some output > before I > : get the out-of-memory. If not, I get out of memory error almost > instantly > : without any output. > : > : Is there a method call to clear the cache ? > : > : Thank you for your response. > : > : > : On 5/14/06, Erik Hatcher <[EMAIL PROTECTED] > wrote: > : > > : > Could you share at least some pseudo-code of what you're doing in > the > : > loop of retrieving the "name" of each document? Are you storing > all > : > of those names as you iterate? > : > > : > Have you profiled your application to see exactly where the memory > is > : > going? It is surely being eaten by your own code and not Lucene. > : > > : > Erik > : > > : > > : > On May 14, 2006, at 12:07 PM, Beady Geraghty wrote: > : > > : > > I have an out-of-memroy error when returning many hits. > : > > > : > > I am still on Lucene 1.4.3 > : > > > : > > I have a simple term query. It returned 899810 documents. > : > > I try to retrieve the name of each document and nothing else > : > > and I ran out of memory. > : > > > : > > Instead of getting the names all at once, I tried to query again > after > : > > every 10,000 document. > : > > I close the index reader, index searcher, and the fsDir and > re-query > : > > for every 10000 documents. This still doesn't work. > : > > > : > >> From another entry in the forum, it appears that the information > : > >> about > : > > the hits that I have skipped over are still kept even though I > don't > : > > access them. Am I understanding it correctly that if I start > : > > accessing > : > > from the 400000th documents onwards, some information about the > : > > 0-399999 > : > > documents are still cached even though I have skipped over those. > : > > Is there a way to get the file name (and perhaps other > information) > : > > of the > : > > remaining > : > > documents ? > : > > > : > > (I tried a different term query that returned a hit size of > 400000, > : > > and I > : > > was able > : > > to get the names of them all without re-quering) > : > > > : > > I think that I see someone mentioned about clearing the hit cache > , > : > > though I don't how this is done. > : > > > : > > Thank you in advance for any hints on dealing with this. > : > > : > > : > > --------------------------------------------------------------------- > : > To unsubscribe, e-mail: [EMAIL PROTECTED] > : > For additional commands, e-mail: [EMAIL PROTECTED] > : > > : > > : > > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >