Re: Running OutOfMemory while optimizing and searching

2004-09-20 Thread John Z
Doug
 
Thank you for confirming this.
 
ZJ

Doug Cutting [EMAIL PROTECTED] wrote:
John Z wrote:
 We have indexes of around 1 million docs and around 25 searchable fields.
 We noticed that without any searches performed on the indexes, on startup, the 
 memory taken up by the searcher is roughly 7 times the .tii file size. The .tii file 
 is read into memory as per the code. Our .tii files are around 8-10 MB in size and 
 our startup memory foot print is around 60-70 MB.
 
 Then when we start doing our searches, the memory goes up, depending on the fields 
 we search on. We are noticing that if we start searching on new fields, the memory 
 kind of goes up. 
 
 Doug, 
 
 Your calculation below on what is taken up by the searcher, does it take into 
 account the .tii file being read into memory or am I not making any sense ? 
 
 1 byte * Number of searchable fields in your index * Number of docs in 
 your index
 plus
 1k bytes * number of terms in query
 plus
 1k bytes * number of phrase terms in query

You make perfect sense. The formula above does not include the .tii. 
My mistake: I forgot that. By default, every 128th Term in the index is 
read into memory, to permit random access to terms. These are stored in 
the .tii file, compressed. So it is not surprising that they require 7x 
the size of the .tii file in memory.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
Do you Yahoo!?
Express yourself with Y! Messenger! Free. Download now.

Re: Running OutOfMemory while optimizing and searching

2004-09-16 Thread John Z
Hi
 
We are trying to get the memory footprint on our searchers.
 
We have indexes of around 1 million docs and around 25 searchable fields.
We noticed that without any searches performed on the indexes, on startup, the memory 
taken up by the searcher is roughly 7 times the .tii file size. The .tii file is read 
into memory as per the code. Our .tii files are around 8-10 MB in size and our startup 
memory foot print is around 60-70 MB.
 
Then when we start doing our searches, the memory goes up, depending on the fields we 
search on. We are noticing that if we start searching on new fields, the memory kind 
of goes up. 
 
Doug, 
 
Your calculation below on what is taken up by the searcher, does it take into account 
the .tii file being read into memory  or am I not making any sense ? 
 
1 byte * Number of searchable fields in your index * Number of docs in 
your index
plus
1k bytes * number of terms in query
plus
1k bytes * number of phrase terms in query


Thank you
ZJ

Doug Cutting [EMAIL PROTECTED] wrote:
 What do your queries look like? The memory required
 for a query can be computed by the following equation:

 1 Byte * Number of fields in your query * Number of
 docs in your index

 So if your query searches on all 50 fields of your 3.5
 Million document index then each search would take
 about 175MB. If your 3-4 searches run concurrently
 then that's about 525MB to 700MB chewed up at once.

That's not quite right. If you use the same IndexSearcher (or 
IndexReader) for all of the searches, then only 175MB are used. The 
arrays in question (the norms) are read-only and can be shared by all 
searches.

In general, the amount of memory required is:

1 byte * Number of searchable fields in your index * Number of docs in 
your index

plus

1k bytes * number of terms in query

plus

1k bytes * number of phrase terms in query

The latter are for i/o buffers. There are a few other things, but these 
are the major ones.

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Question on number of fields in a document.

2004-08-04 Thread John Z
Hi
 
I had a question related to number of fields in a document. Is there any limit to the 
number of fields you can have in an index.
 
We have around 25-30 fields per document at present, about 6 are keywords,  Around 6 
stored, but not indexed and rest of them are text, which is analyzed and indexed 
fields. We are planning on adding around 24 more fields , mostly keywords.
 
Does anyone see any issues with this? Impact to search or index ?
 
Thanks
ZJ




-
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!

Re: Question on number of fields in a document.

2004-08-04 Thread John Z
Thanks
I was looking at some older email on the list and found an email where Doug Cutting 
says that fields not analyzed, we need not store the norms , nor load them into memory.
 
That change in the indexer will help a lot in this situation, where we might have 24 
fields indexed but not analyzed.
 
ZJ

Paul Elschot [EMAIL PROTECTED] wrote:
On Wednesday 04 August 2004 18:22, John Z wrote:
 Hi

 I had a question related to number of fields in a document. Is there any
 limit to the number of fields you can have in an index.

 We have around 25-30 fields per document at present, about 6 are keywords, 
 Around 6 stored, but not indexed and rest of them are text, which is
 analyzed and indexed fields. We are planning on adding around 24 more
 fields , mostly keywords.

 Does anyone see any issues with this? Impact to search or index ?

During search one byte of RAM is needed per searched field per document
for the normalisation factors, even if a document field is empty.
This RAM is occupied the first time a field is searched after opening
an index reader.
Supposing your queries would actually search 50 fields before
closing the index reader, the norms would occupy 50 bytes/doc, or
1 GB / 20MDocs.

Regards,
Paul

Regards,
Paul


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com