Hey Steve, I'd suggest you go with the 20 fields (Non normalized) model. I've used much larger models and they happen to work just fine. Wouldnt be a point increasing the complexity. Hope that clarifies things a little atleast :) -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw............ On Tue, Sep 8, 2009 at 6:57 PM, Stephen Greene <sgre...@metalseconomics.com>wrote: > Hi Anshum, > > Thank you for your reply. I have two options I am considering. > One would be: > Document { > String projectID; > String generalComment; > String workHistoryComment; > String environmentalComment; > String claimsComment; > ... > } > > And the document may contain upwards of 20 comment fields. > > The other option would be to normalize the data > Document { > String projectID; > String commentType; > String comment; > } > > I will need to return only the projectID for all found documents. I have > implemented a custom Collector to capture the projectID for each > document. Then it occurred to me that I might be better served by the > normalized document model. But I am wondering which method will have > better performance: possibly returning 20 documents per hit, or having > to search 20 fields per document? (This also has implications for the > query, as each search term will always search all fields, this is > somewhat easier in the normalized example as opposed to creating 20 "or" > queries.) > > Thanks, > > Steve > > -----Original Message----- > From: Anshum [mailto:ansh...@gmail.com] > Sent: Tuesday, September 08, 2009 9:47 AM > To: java-user@lucene.apache.org > Subject: Re: large document with multiple fields performance > > Hi Stephen, > Could you clarify more on the requirement. Do you intend to have data in > index as: > Document{ > String Comment; > String CommentId; > String ProjectId; > } > > How do you intend to index it.. as in the doc structure? Is there a > primary > key there? What would you search on? What would you want to have as the > result? > All said and done, its not really an overhead as long as the number of > fields is within normal bounds. > > > -- > Anshum Gupta > Naukri Labs! > http://ai-cafe.blogspot.com > > The facts expressed here belong to everybody, the opinions to me. The > distinction is yours to draw............ > > > On Tue, Sep 8, 2009 at 5:27 PM, Stephen Greene > <sgre...@metalseconomics.com>wrote: > > > Hello, > > > > > > > > I am new to lucene and building an application which requires > documents > > with many fields to be searched. > > > > A "project" id is being stored (not_analyzed) and all matching project > > ids will be returned to be used to join other data from a database. > > > > Will it provide better performance to store each comment field in a > > separate document with the project ID and a comment ID or to store all > > the comments for a single project in a single document with multiple > > fields? > > > > > > > > Thanks, > > > > > > > > Steve Greene > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >