Hi All-- Mark Sapiro wrote: > Ivan Van Laningham wrote: >> But I have one list for which I used archives from two previous >> incarnations of the list, plus the current archive mbox, as input to >> arch. I made sure that the previous archives were in mbox format and >> that they contained only one "From " line per message. > > > Are you sure? Did you run bin/cleanarch against the .mbox file to check > it? >
I ran cleanarch, yes, but all it did was to escape every single "From " line, which would make arch think there was only one message. > > This usually results from a message containing an embedded "From " > somewhere in the message body. The message is archived properly under > its correct date and subject, but that entry is truncated at the line > that begins with "From ". Then the rest of the message is archived as > a separate message. Since it has no From:, Subject: or Date: headers, > it is archived with the current date and no subject. Also , text > following the "From " up to the first totally empty (not just blank) > line is considered part of the header and is not archived with this > 'second' message. > That would describe what I'm seeing, except that-- > > If there is any message body text in the 'No subject' archived entry, > you should be able to find that in the .mbox. > Right, but there are 5,000 entries with "No subject" and no body, not a hint of a body. > >> The _only_ thing I can see, in the current mbox, >> is that the end of the last message from the old archives ends on one >> line and the "From " line for the next message begins on the very next >> line, with no blank lines between, > > > That shouldn't cause this. > Good to know. > >> and everywhere else there are either >> one or more blank lines or one of those message separator lines from >> AOL: >>> "----------MB_8C9379FAFA8ECEC_DAC_6C2A_WEBMAIL-MC05.sysops.aol.com--"< >> These bogus entries aren't really hurting anything, I suppose, but they >> are annoying and it is irritating to have to scroll down 5000 lines to >> get to the next real message. > > > They are actually, because they represent missing pieces of other > messages. > How to track them down? > >> What is causing this? And is there anything I can do to get rid of the >> problem? I am willing to live with it if I have to, but I would prefer >> having a fix. > > > I think you have unescaped "From " lines in the bodies of messages. Run > bin/cleanarch (with the -n/--dry-run option) to check. > > Another possibility is you have real looking but extraneous > (duplicate?) "From " lines not followed by a real message with > Subject: and Date: headers prior to the next "From ". > Do lines beginning with whitespace before a From count? There are about a hundred of those in the input mbox. Metta, Ivan -- Ivan Van Laningham God N Locomotive Works http://www.pauahtun.org/ http://www.python.org/workshops/1998-11/proceedings/papers/laningham/laningham.html Army Signal Corps: Cu Chi, Class of '70 Author: Teach Yourself Python in 24 Hours ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp