this was a big boolean query, with several prefixqueries but no wildcard queries in the or-branches.
-----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED]] Sent: 12. november 2001 17:41 To: Lucene Users List Subject: RE: Memory Usage? This was a single query? How many terms, and of what type are in the query? >From the trace it looks like there could be over 40,000 terms in the query! Is this a prefix or wildcard query? These can generate *very* large queries... Doug > -----Original Message----- > From: Anders Nielsen [mailto:[EMAIL PROTECTED]] > Sent: Sunday, November 11, 2001 6:59 AM > To: Lucene Users List > Subject: RE: Memory Usage? > > > I am not very familiar with the output of -Xrunhprof, but > I've attached the > output of a run of a search through and index of 50.000 > documents. It gave > me out-of-memory errors until I allocated 100 megabytes of heap-space. > > The top 10: > > SITES BEGIN (ordered by live bytes) Sun Nov 11 15:50:31 2001 > percent live alloc'ed stack class > rank self accum bytes objs bytes objs trace name > 1 26.41% 26.41% 12485200 12005 45566560 43814 1783 [B > 2 25.18% 51.59% 11904880 11447 44867680 43142 1796 [B > 3 4.15% 55.74% 1962904 69214 171546352 5510292 1632 [C > 4 3.83% 59.58% 1812096 3432 1812096 3432 1768 [I > 5 3.83% 63.41% 1812096 3432 1812096 3432 1769 [I > 6 3.34% 66.75% 1580688 65862 130618992 5442458 1631 > java.lang.String > 7 3.19% 69.95% 1509584 44763 1509584 44763 458 [C > 8 3.03% 72.98% 1432416 44763 1432416 44763 459 > org.apache.lucene.index.TermInfo > 9 2.27% 75.25% 1074312 44763 1074312 44763 457 > java.lang.String > 10 2.23% 77.48% 1053792 65862 87079328 5442458 1631 > org.apache.lucene.index.Term > > and the top 3 traces were: > > TRACE 1783: > > org.apache.lucene.store.InputStream.refill(InputStream.java:165) > > org.apache.lucene.store.InputStream.readByte(InputStream.java:80) > > org.apache.lucene.store.InputStream.readVInt(InputStream.java:106) > > org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:101) > > TRACE 1796: > > org.apache.lucene.store.InputStream.refill(InputStream.java:165) > > org.apache.lucene.store.InputStream.readByte(InputStream.java:80) > > org.apache.lucene.store.InputStream.readVInt(InputStream.java:106) > > org.apache.lucene.index.SegmentTermPositions.next(SegmentTermP > ositions.java: > 100) > > TRACE 1632: > java.lang.String.<init>(String.java:198) > > org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEn > um.java:134) > > org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:114) > > org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosRead > er.java:166) > > > I've attached the whole trace as gzipped.txt > > regards, > Anders Nielsen > > -----Original Message----- > From: Doug Cutting [mailto:[EMAIL PROTECTED]] > Sent: 10. november 2001 04:35 > To: 'Lucene Users List' > Subject: RE: Memory Usage? > > > I'm surprised that your memory use is that high. > > An IndexReader requires: > one byte per field per document in index (norms) > one open file per file in index > 1/128 of the Terms in the index > a Term has two pointers (8 bytes) > and a String (4 pointers = 24 bytes, one to 16-bit chars) > > A Search requires: > 1 1024 byte buffer per TermQuery > 2 128 int buffers per TermQuery > 2 1024 byte buffers per PhraseQuery term > 1 1024 element bucket array per BooleanQuery > each bucket has 5 fields, and hence requires ~20 bytes > 1 bit per document in index per DateFilter > > A Hits requires: > up to n+100 ScoreDocs (float+int, 8 bytes) > where n is the highest Hits.doc(n) accessed > up to 200 Document objects > > I may have forgotten something... > > Let's assume that your 1M document index has 2M unique terms, > and that you > only look at the top-100 hits, that your index has three > fields, and that > the typical document has two stored fields, each 20 characters. Your > 30-term boolean query over a 1M document index should use around the > following numbers of bytes: > IndexReader: > 3,000,000 (norms) > 1,000,000 (1/128 of 2M terms, each requiring ~50 bytes) > during search > 50,000 (TermQuery buffers) > 20,000 (BooleanQuery buckets) > 100,000 (DateFilter bit vector) > in Hits > 2,000 (200 ScoreDocs) > 30,000 (up to 200 cached Documents) > > So searches should run in a 5Mb heap. Are my assumptions off? > > You can also see why it is useful to keep a single > IndexReader and use it > for all queries. (IndexReader is thread safe.) > > You could also 'java -Xrunhprof:heap=sites' to see what's > using memory. > > Doug > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>