Hi James, I just got to a point in my outstanding rework where I thought it would make sense to pull this patch series in, (I'm adding support for storing multiple filenames in a single mail document).
I took a closer look at this series, and I think it's still independent, so I'll finish up what I'm doing and then add this series on top later. But I can at least answer some of the questions you asked for now: > Does the re-indexing replace the old terms? Before this patch, there's there's not yet any "re-indexing" in notmuch. So we'll basically need to think about what we want to do here. As this patch is written, (just calling into the existing _index_file function), the re-indexing only adds new terms, (and doesn't delete any). That's probably correct. We're using file size as an heuristic that the larger file is a superset of the smaller file, but it doesn't guarantee that the smaller file doesn't contain any unique terms. So I'd be extremely hesitant to drop any terms here. > In the case > where you had a collision with different text this could > make a search return mails that don't contain that text. > I don't think it's a big issue though, even if that is the > case. That's correct. As mentioned in a previous thread, this is likely only a big issue in the face of deliberate message-ID spoofing or so. In that thread we talked about some ideas for mitigating that. But I don't think we need to solve that problem before applying this patch series. -Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20091221/81364919/attachment.pgp>