On Saturday 11 March 2006 08:07, Lawrence wrote:
> Hi all,
> 
> 
> 
> I was reading one of the posting on concurrency and I reread section 9.1 in 
Lucene in Action which lead me to this question. I have 100,000 customers and 
I want to provide them with personal searching for their documents and 
sometimes to include company documents in that search.
> 
> 1.    100,000 customers with 10-20 small document each.
> 2.    Company 5,000 documents, specification, papers, research, etc.
> 3.    Customers can search their own documents and company document.
> 
> P1: Do I provide an index for each customer and allow them multiple index 
searching, into company document when they need it?
> 
> OR
> 
> P2: Do I provide one large index for all my 100,000 customers, adding a 
field for customer ID so searching can be constrained, so they won’t/can’t 
search across other customer’s documents, and then categorize company 
documents so customers can do multiple index searches into company documents?
> 
> After writing this out I realize that P2 is probably the wiser choice, less 
complicated, but I would like to hear from other Luceners.

In case you have many customers searching at the same time, compact filters
can help reduce memory requirements:
http://issues.apache.org/jira/browse/LUCENE-328
A BitSet filter uses one bit per indexed document, and a compact filter uses 
one or three bytes per indexed document passing the filter.
When there are 100 different customers searching in their own docs at the
same time, assuming there are 100,000 * 20 docs in the customer index:
- BitSet filters will use 100 * (100,000 * 20) / 8 bytes,
- compact filters will use roughly 100 * 20 * 2  bytes.
The ratio between these is roughly 100,000 / 16 or about 6000.

Since the company docs will not need to be filtered, you can put these in a
separate index, and write your own MultiSearcher that filters only on the
customer index.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to