Hello,

does anybody has an idea what is the best design approch for realizing
the following:

The goal is to index emails and their corresponding file attachments.
One email could contain for example:

1 x subject
1 x sender-address
1 x to-addresses
1 x message-text
0..n x file-attachments  (each contains a 'file-name' and the
'file-content')

How should I build the index?

First approach:
Each email + attachments gets one document with the following fields:
subject, sender_address, to_address, message_text, 1_attachment_name,
1_attachment_content, 2_attachment_name, 2_attachment_content,
3_attachment_name, 3_attachment_content
Disadvantage:
Only three attachments could be indexed. It isn't a generic solution for
indexing 'n' file-attachments.

Second approach:
Each email gets one document with the main email-data and 0 to n documents
of file-attachments:
1 x  email_id, subject, sender_address, to_address, message_text
0..n x  email_id, attachment_name, attachment_content
Disadvantage:
At query time it is difficult to aggregate the documents that belongs to
each other. One hit per email (including attachments) should be shown.

Any thoughts?

Thanks
lude

Reply via email to