I put in 1TB as a number because I thought it would surely be bigger than anything you intended to put in your database. And you reply with 100 times that size <G>.....
The index I'm working with now is 5GB, so I have no wisdom to offer you at all about how to scale to 100TB. You should probably infer from Otis' reply that we don't really know anybody who's tried off the top of our heads. Good Luck! Erick On 1/27/07, Joost Schouten <[EMAIL PROTECTED]> wrote:
Erick, Otis, Thank you for your help. I will work with a single index and parent fields. It's hard to say exactly how much raw data I will index as this differs per client. But I guess right now I'm more looking at 1G (contents of a non-CLOB/BLOB DB). But one client is thinking of throwing their entire 100T file system in it. Not quite sure how to handle that yet. Should I have a different architecture with 100T compared to 1G? Thanks, Joost Schouten Director JS Portal Dasstraat 21 2623CB Delft the Netherlands P: +31 6 160 160 14 E: [EMAIL PROTECTED] W: www.jsportal.com -----Original Message----- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Saturday, January 27, 2007 1:30 PM To: java-user@lucene.apache.org Subject: Re: lucense index/document architecture To steal a phrase from Mr. Hatcher... it depends <G>. I'd try keeping it all in one index at the start until you get some clue how big the index will eventually grow to and whether your searching is acceptable. Do you have any idea how big the raw data you're going to ask the index to hold? 1M? 1G?, 1T? But it's simple enough to do what you want, just include a field for each document, let's say Company. Your queries can easily search all documents or only those belonging to a single company by including an "+company:companyyoucareabout". Or search all documents by leaving that clause off. Do be aware, when you're doing performance testing, that the first query, particularly when sorting, takes significantly longer since Lucene will build up some internal caches and you pay a penalty the first time through. Various strategies exist for pre-warming the searcher up by firing some canned queries at the search engine as the server comes up...... If you're a database guy, you might not appreciate one thing that was hard for me to understand; all documents in an index do NOT have to have the same fields. In fact, your index could theoretically have no two documents with any field in common <G>.If you're used to thinking about static table definitions in a database this can take a while to get used to. Hope this helps Erick On 1/26/07, Joost Schouten <[EMAIL PROTECTED]> wrote: > > Hi, > > I'm setting up lucene to work with our webapp to index a database. My db > holds files which can belong to a user or a company or both. I want the > option for my users to search across all content, but also search within > the > files for one user or company. What is the best architecture approach for > this? Do you add a field to the document with the parentId's, do you make > a > different index for each user/company (can be 1000's) or is there a > different solution all together? > > Thank you, > Joost > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]