At present if the archiver cannot extract what it considers
displayable body text from the input, it may drop the message
entirely.

For example, this currently happens for valid messages such as
HTML-only (Nexus) and for signature-only messages [1], [2]. (*)

This does not make sense for an archiver.

At the very least it should index the raw source, and put some kind of
marker in the summary record to show that it could not understand the
message structure.

I don't think it would be good to store the message in the body itself.
That would mess up searches and statistics.

Rather than add a separate flag, it occurs to me that this would be a
good use for storing the body as 'null'.

It's not possible for a real message to have a null body - it may be
empty, but it cannot be null.

The GUI could then be fixed to display a standard message explaining
that the message cannot be displayed, as is done by mod_mbox. At least
then readers can look at the source itself.

Also, when the parser is improved to deal with more message layouts,
it would be easy to find such emails and re-index them.

Does that make sense?

[1] 
http://mail-archives.apache.org/mod_mbox/httpd-users/201212.mbox/%3Ce9bd8c2b31947867d3bf1174d27b8e01%40mail.gmail.com%3E

[2] 
http://mail-archives.apache.org/mod_mbox/ofbiz-dev/201505.mbox/%3C26E553E8-7C7F-4CB3-A553-EE7487E2BC9C%40ecomify.de%3E

(*) HTML-only messages are currently dropped if html2text is not available
Sig-only messages are dropped because the code only checks multipart
messages for attachments.
Both of these can be fixed, now that they are known. But other issues
may arise in future.

Reply via email to