Erick, Otis,

Thank you for your help. I will work with a single index and parent fields.
It's hard to say exactly how much raw data I will index as this differs per
client. But I guess right now I'm more looking at 1G (contents of a
non-CLOB/BLOB DB). But one client is thinking of throwing their entire 100T
file system in it. Not quite sure how to handle that yet. Should I have a
different architecture with 100T compared to 1G?

Thanks,
Joost Schouten
Director
 
JS Portal
Dasstraat 21
2623CB Delft
the Netherlands
P: +31 6 160 160 14
E: [EMAIL PROTECTED]
W: www.jsportal.com 

-----Original Message-----
From: Erick Erickson [mailto:[EMAIL PROTECTED] 
Sent: Saturday, January 27, 2007 1:30 PM
To: java-user@lucene.apache.org
Subject: Re: lucense index/document architecture

To steal a phrase from Mr. Hatcher... it depends <G>. I'd try keeping it all
in one index at the start until you get some clue how big the index will
eventually grow to and whether your searching is acceptable. Do you have any
idea how big the raw data you're going to ask the index to hold? 1M? 1G?,
1T?

But it's simple enough to do what you want, just include a field for each
document, let's say Company. Your queries can easily search all documents or
only those belonging to a single company by including an
"+company:companyyoucareabout". Or search all documents by leaving that
clause off.

Do be aware, when you're doing performance testing, that the first query,
particularly when sorting, takes significantly longer since Lucene will
build up some internal caches and you pay a penalty the first time through.
Various strategies exist for pre-warming the searcher up by firing some
canned queries at the search engine as the server comes up......

If you're a database guy, you might not appreciate one thing that was hard
for me to understand; all documents in an index do NOT have to have the same
fields. In fact, your index could theoretically have no two documents with
any field in common <G>.If you're used to thinking about static table
definitions in a database this can take a while to get used to.

Hope this helps
Erick

On 1/26/07, Joost Schouten <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I'm setting up lucene to work with our webapp to index a database. My db
> holds files which can belong to a user or a company or both. I want the
> option for my users to search across all content, but also search within
> the
> files for one user or company. What is the best architecture approach for
> this? Do you add a field to the document with the parentId's, do you make
> a
> different index for each user/company (can be 1000's) or is there a
> different solution all together?
>
> Thank you,
> Joost
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to