I'll keep you posted ;-) Joost Schouten Director JS Portal Dasstraat 21 2623CB Delft the Netherlands P: +31 6 160 160 14 W: www.jsportal.com
-----Original Message----- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Sunday, January 28, 2007 4:34 AM To: java-user@lucene.apache.org Subject: Re: lucense index/document architecture 100TB? Ouch. Yes, most certainly very different. Again, how to split the index and design the whole system depends on how this is going to be used, how it's going to be changed, if it's going to be changed, how it's going to grow, etc. I'd love to hear from you once you start working with 100TB of data, and I'm sure others would, too. Otis ----- Original Message ---- From: Joost Schouten <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Saturday, January 27, 2007 6:08:49 PM Subject: RE: lucense index/document architecture Erick, Otis, Thank you for your help. I will work with a single index and parent fields. It's hard to say exactly how much raw data I will index as this differs per client. But I guess right now I'm more looking at 1G (contents of a non-CLOB/BLOB DB). But one client is thinking of throwing their entire 100T file system in it. Not quite sure how to handle that yet. Should I have a different architecture with 100T compared to 1G? Thanks, Joost Schouten Director JS Portal Dasstraat 21 2623CB Delft the Netherlands P: +31 6 160 160 14 E: [EMAIL PROTECTED] W: www.jsportal.com -----Original Message----- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Saturday, January 27, 2007 1:30 PM To: java-user@lucene.apache.org Subject: Re: lucense index/document architecture To steal a phrase from Mr. Hatcher... it depends <G>. I'd try keeping it all in one index at the start until you get some clue how big the index will eventually grow to and whether your searching is acceptable. Do you have any idea how big the raw data you're going to ask the index to hold? 1M? 1G?, 1T? But it's simple enough to do what you want, just include a field for each document, let's say Company. Your queries can easily search all documents or only those belonging to a single company by including an "+company:companyyoucareabout". Or search all documents by leaving that clause off. Do be aware, when you're doing performance testing, that the first query, particularly when sorting, takes significantly longer since Lucene will build up some internal caches and you pay a penalty the first time through. Various strategies exist for pre-warming the searcher up by firing some canned queries at the search engine as the server comes up...... If you're a database guy, you might not appreciate one thing that was hard for me to understand; all documents in an index do NOT have to have the same fields. In fact, your index could theoretically have no two documents with any field in common <G>.If you're used to thinking about static table definitions in a database this can take a while to get used to. Hope this helps Erick On 1/26/07, Joost Schouten <[EMAIL PROTECTED]> wrote: > > Hi, > > I'm setting up lucene to work with our webapp to index a database. My db > holds files which can belong to a user or a company or both. I want the > option for my users to search across all content, but also search within > the > files for one user or company. What is the best architecture approach for > this? Do you add a field to the document with the parentId's, do you make > a > different index for each user/company (can be 1000's) or is there a > different solution all together? > > Thank you, > Joost > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]