RE: large document with multiple fields performance

Stephen Greene Tue, 08 Sep 2009 06:27:49 -0700

Hi Anshum,

Thank you for your reply. I have two options I am considering.
One would be:
Document {
        String projectID;
        String generalComment;
        String workHistoryComment;
        String environmentalComment;
        String claimsComment;
        ...
}

And the document may contain upwards of 20 comment fields.

The other option would be to normalize the data
Document {
        String projectID;
        String commentType;
        String comment;
}

I will need to return only the projectID for all found documents. I have
implemented a custom Collector to capture the projectID for each
document. Then it occurred to me that I might be better served by the
normalized document model. But I am wondering which method will have
better performance: possibly returning 20 documents per hit, or having
to search 20 fields per document? (This also has implications for the
query, as each search term will always search all fields, this is
somewhat easier in the normalized example as opposed to creating 20 "or"
queries.)

Thanks,

Steve

-----Original Message-----
From: Anshum [mailto:[email protected]] 
Sent: Tuesday, September 08, 2009 9:47 AM
To: [email protected]
Subject: Re: large document with multiple fields performance

Hi Stephen,
Could you clarify more on the requirement. Do you intend to have data in
index as:
Document{
  String Comment;
  String CommentId;
  String ProjectId;
}

How do you intend to index it.. as in the doc structure? Is there  a
primary
key there? What would you search on? What would you want to have as the
result?
All said and done, its not really an overhead as long as the number of
fields is within normal bounds.

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............

On Tue, Sep 8, 2009 at 5:27 PM, Stephen Greene
<[email protected]>wrote:

> Hello,
>
>
>
> I am new to lucene and building an application which requires
documents
> with many fields to be searched.
>
> A "project" id is being stored (not_analyzed) and all matching project
> ids will be returned to be used to join other data from a database.
>
> Will it provide better performance to store each comment field in a
> separate document with the project ID and a comment ID or to store all
> the comments for a single project in a single document with multiple
> fields?
>
>
>
> Thanks,
>
>
>
> Steve Greene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: large document with multiple fields performance

Reply via email to