Daniel Kahn Gillmor <d...@fifthhorseman.net> writes: > On Thu 2017-03-16 20:34:22 -0400, David Bremner wrote: >> Daniel Kahn Gillmor <d...@fifthhorseman.net> writes: >>> 0) what happens when one of the files gets deleted from the message >>> store? do the terms it contributes get removed from the index? >> >> That's a good guestion, and an issue I hadn't thought about. >> Currently there's no way to do this short of deleting all the terms (for >> all the files (excepting tags and properties, presumably) and >> reindexing. This will require some more thought, I think. > > i didn't mean to raise the concern to drag this work down, i just want > to make sure the problem is on the table. dropping all terms on > deletion and re-indexing remaining files with the same message ID isn't > terribly efficient, but i don't think it's going to be terribly costly > either. we're not talking about hundreds of files per message-id in > most normal cases; usually only two (sent-to-self, > recvd-from-mailing-list), and maybe a half-dozen at most (messages sent > to multiple mailboxes that all forward to me).
I can think of 3 general approaches at the moment. They each have (at least) one gotcha; more precisely they each require some added complexity somewhere else in the codebase. One is this one, just add all the terms to one xapian document. The gotcha is needing some reindexing facility (we want this for other reasons, so that might not be so bad). The second approach that occurs to me is to still add the terms to one xapian document, but to prefix them with a number identifying the file copy (1,2, etc). The complexity here is in the generation of queries, each one needs to be OR_ed with eg. SUBJECT:foo or 1#SUBJECT:foo or 2#SUBJECT:foo. I'm not really sure offhand how to do that without field processors. I'm also not sure about the performance impact. The third approach is create extra xapian documents per file, which have a different document type (from the notmuch point of view). Here the complexity will be dealing with the returned documents from a xapian query. We can probably use a wildcard search on the type (mail, mail1, mail2, etc...) to make the queries reasonably easy. My gut feeling is that this is the "right" approach, althought it will be a bit more complicated to get started. It will also require changing our idea of threads in the "structured output" where a thread looks something like (thread (message (instance/file) (instance/file)) (message (instance/file)) _______________________________________________ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch