Re: Running OutOfMemory while optimizing and searching
Doug Thank you for confirming this. ZJ Doug Cutting [EMAIL PROTECTED] wrote: John Z wrote: We have indexes of around 1 million docs and around 25 searchable fields. We noticed that without any searches performed on the indexes, on startup, the memory taken up by the searcher is roughly 7 times the .tii file size. The .tii file is read into memory as per the code. Our .tii files are around 8-10 MB in size and our startup memory foot print is around 60-70 MB. Then when we start doing our searches, the memory goes up, depending on the fields we search on. We are noticing that if we start searching on new fields, the memory kind of goes up. Doug, Your calculation below on what is taken up by the searcher, does it take into account the .tii file being read into memory or am I not making any sense ? 1 byte * Number of searchable fields in your index * Number of docs in your index plus 1k bytes * number of terms in query plus 1k bytes * number of phrase terms in query You make perfect sense. The formula above does not include the .tii. My mistake: I forgot that. By default, every 128th Term in the index is read into memory, to permit random access to terms. These are stored in the .tii file, compressed. So it is not surprising that they require 7x the size of the .tii file in memory. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Do you Yahoo!? Express yourself with Y! Messenger! Free. Download now.
Re: Running OutOfMemory while optimizing and searching
Hi We are trying to get the memory footprint on our searchers. We have indexes of around 1 million docs and around 25 searchable fields. We noticed that without any searches performed on the indexes, on startup, the memory taken up by the searcher is roughly 7 times the .tii file size. The .tii file is read into memory as per the code. Our .tii files are around 8-10 MB in size and our startup memory foot print is around 60-70 MB. Then when we start doing our searches, the memory goes up, depending on the fields we search on. We are noticing that if we start searching on new fields, the memory kind of goes up. Doug, Your calculation below on what is taken up by the searcher, does it take into account the .tii file being read into memory or am I not making any sense ? 1 byte * Number of searchable fields in your index * Number of docs in your index plus 1k bytes * number of terms in query plus 1k bytes * number of phrase terms in query Thank you ZJ Doug Cutting [EMAIL PROTECTED] wrote: What do your queries look like? The memory required for a query can be computed by the following equation: 1 Byte * Number of fields in your query * Number of docs in your index So if your query searches on all 50 fields of your 3.5 Million document index then each search would take about 175MB. If your 3-4 searches run concurrently then that's about 525MB to 700MB chewed up at once. That's not quite right. If you use the same IndexSearcher (or IndexReader) for all of the searches, then only 175MB are used. The arrays in question (the norms) are read-only and can be shared by all searches. In general, the amount of memory required is: 1 byte * Number of searchable fields in your index * Number of docs in your index plus 1k bytes * number of terms in query plus 1k bytes * number of phrase terms in query The latter are for i/o buffers. There are a few other things, but these are the major ones. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Question on number of fields in a document.
Hi I had a question related to number of fields in a document. Is there any limit to the number of fields you can have in an index. We have around 25-30 fields per document at present, about 6 are keywords, Around 6 stored, but not indexed and rest of them are text, which is analyzed and indexed fields. We are planning on adding around 24 more fields , mostly keywords. Does anyone see any issues with this? Impact to search or index ? Thanks ZJ - Do you Yahoo!? New and Improved Yahoo! Mail - Send 10MB messages!
Re: Question on number of fields in a document.
Thanks I was looking at some older email on the list and found an email where Doug Cutting says that fields not analyzed, we need not store the norms , nor load them into memory. That change in the indexer will help a lot in this situation, where we might have 24 fields indexed but not analyzed. ZJ Paul Elschot [EMAIL PROTECTED] wrote: On Wednesday 04 August 2004 18:22, John Z wrote: Hi I had a question related to number of fields in a document. Is there any limit to the number of fields you can have in an index. We have around 25-30 fields per document at present, about 6 are keywords, Around 6 stored, but not indexed and rest of them are text, which is analyzed and indexed fields. We are planning on adding around 24 more fields , mostly keywords. Does anyone see any issues with this? Impact to search or index ? During search one byte of RAM is needed per searched field per document for the normalisation factors, even if a document field is empty. This RAM is occupied the first time a field is searched after opening an index reader. Supposing your queries would actually search 50 fields before closing the index reader, the norms would occupy 50 bytes/doc, or 1 GB / 20MDocs. Regards, Paul Regards, Paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com