Hi Anshum, Thank you for your reply. I have two options I am considering. One would be: Document { String projectID; String generalComment; String workHistoryComment; String environmentalComment; String claimsComment; ... }
And the document may contain upwards of 20 comment fields. The other option would be to normalize the data Document { String projectID; String commentType; String comment; } I will need to return only the projectID for all found documents. I have implemented a custom Collector to capture the projectID for each document. Then it occurred to me that I might be better served by the normalized document model. But I am wondering which method will have better performance: possibly returning 20 documents per hit, or having to search 20 fields per document? (This also has implications for the query, as each search term will always search all fields, this is somewhat easier in the normalized example as opposed to creating 20 "or" queries.) Thanks, Steve -----Original Message----- From: Anshum [mailto:ansh...@gmail.com] Sent: Tuesday, September 08, 2009 9:47 AM To: java-user@lucene.apache.org Subject: Re: large document with multiple fields performance Hi Stephen, Could you clarify more on the requirement. Do you intend to have data in index as: Document{ String Comment; String CommentId; String ProjectId; } How do you intend to index it.. as in the doc structure? Is there a primary key there? What would you search on? What would you want to have as the result? All said and done, its not really an overhead as long as the number of fields is within normal bounds. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw............ On Tue, Sep 8, 2009 at 5:27 PM, Stephen Greene <sgre...@metalseconomics.com>wrote: > Hello, > > > > I am new to lucene and building an application which requires documents > with many fields to be searched. > > A "project" id is being stored (not_analyzed) and all matching project > ids will be returned to be used to join other data from a database. > > Will it provide better performance to store each comment field in a > separate document with the project ID and a comment ID or to store all > the comments for a single project in a single document with multiple > fields? > > > > Thanks, > > > > Steve Greene > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org