Hello, does anybody has an idea what is the best design approch for realizing the following:
The goal is to index emails and their corresponding file attachments. One email could contain for example: 1 x subject 1 x sender-address 1 x to-addresses 1 x message-text 0..n x file-attachments (each contains a 'file-name' and the 'file-content') How should I build the index? First approach: Each email + attachments gets one document with the following fields: subject, sender_address, to_address, message_text, 1_attachment_name, 1_attachment_content, 2_attachment_name, 2_attachment_content, 3_attachment_name, 3_attachment_content Disadvantage: Only three attachments could be indexed. It isn't a generic solution for indexing 'n' file-attachments. Second approach: Each email gets one document with the main email-data and 0 to n documents of file-attachments: 1 x email_id, subject, sender_address, to_address, message_text 0..n x email_id, attachment_name, attachment_content Disadvantage: At query time it is difficult to aggregate the documents that belongs to each other. One hit per email (including attachments) should be shown. Any thoughts? Thanks lude