On Jul 25, 2019, at 4:25 PM, Ken Hornstein <k...@pobox.com> wrote: > >> Once in a while I download email archives of some mailing list >> and unpack them using "inc -file <archive-file>". But more >> than once I have seen that inc gets confused and doesn't >> unpack the whole thing. The cause seems to be a line starting >> with From in some message body. Ideally inc should look that >> a "From ..." line is immediately followed by header lines. >> And if this is not the case, assume it is in the message body. > > Ralph answered this, but let me expand a bit. > > The job of inc(1) is to incorporate messages from a 'mail drop' into your > MH mailbox. Traditionally it handles mbox-style files and POP (it also > does MMDF, but let us not speak of that). > > As you can see from the Wikipedia entry Ralph linked to, all of the > various mbox formats use the same scheme: a line beginning with "From > " is the mailbox delimiter (mboxcl and mboxcl2 uses a Content-Length > header; I believe they are officially dead at this point). The big > differences are in quoting rules. Unfortunately since we're kind of > locked in to the mbox format in inc(1) at least, changing that would > have some nasty consequences (Ralph gave you an example of a message > that it would break on but I am sure there are others). I think your > best bet is to preprocess these mailing list archives so they are valid > mbox files.
Thanks, Ralph & Ken. The site from where I downloaded the latest email archive uses mailman so I was a bit surprised. The method I suggested would make inc able to handle a larger set of inputs. While there can still be false positives, the number of messages matching From ... [0-9]$ <mail header>: is likely to be much much smaller than a random line starting with "From " and ending in a digit. Still, I can understand the reluctance to add this logic to inc. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers