At present if the archiver cannot extract what it considers displayable body text from the input, it may drop the message entirely.
For example, this currently happens for valid messages such as HTML-only (Nexus) and for signature-only messages [1], [2]. (*) This does not make sense for an archiver. At the very least it should index the raw source, and put some kind of marker in the summary record to show that it could not understand the message structure. I don't think it would be good to store the message in the body itself. That would mess up searches and statistics. Rather than add a separate flag, it occurs to me that this would be a good use for storing the body as 'null'. It's not possible for a real message to have a null body - it may be empty, but it cannot be null. The GUI could then be fixed to display a standard message explaining that the message cannot be displayed, as is done by mod_mbox. At least then readers can look at the source itself. Also, when the parser is improved to deal with more message layouts, it would be easy to find such emails and re-index them. Does that make sense? [1] http://mail-archives.apache.org/mod_mbox/httpd-users/201212.mbox/%3Ce9bd8c2b31947867d3bf1174d27b8e01%40mail.gmail.com%3E [2] http://mail-archives.apache.org/mod_mbox/ofbiz-dev/201505.mbox/%3C26E553E8-7C7F-4CB3-A553-EE7487E2BC9C%40ecomify.de%3E (*) HTML-only messages are currently dropped if html2text is not available Sig-only messages are dropped because the code only checks multipart messages for attachments. Both of these can be fixed, now that they are known. But other issues may arise in future.
