On 03/10/2016 03:19 AM, Sebastian Hagedorn wrote: > > Unless you're really interested in the other differences you referred to > in your other message, I won't bother to analyze them further. It seems > clear to me that you have identified the main issue.
I understand the issue, and I know how to "fix" it. I'm a bit uncertain about what to change a bad date to. Normally, messages in the cumulative .mbox have at least three sources of date. There is a Date: header, The mbox From_ separator line, and at least if the message originally came via Mailman, an X-List-Received-Date: header that was added by Mailman's ArchRunner when the message was archived. Also, depending to an extent on site configuration, if the message was originally archived by Mailman, it's archived Date: header will normally be "close" to the time it was received by Mailman. See the code in the _dispose() method in Mailman/Queue/ArchRunner.py. So what this says is if a message in the mbox has a bad Date:, it is probably from an imported mbox, and it's not clear that the From_ date will be any better. In the messages and excerpts you posted earlier, the From_ dates were all within a few minutes of "Mon Nov 7 14:08:46 2005" which is probably the time that portion of the mbox was built from a majordomo archive. I have made a script at <https://www.msapiro.net/scripts/cleanarch2> (mirrored at <http://fog.ccsf.edu/~msapiro/scripts/cleanarch2>) which augments the standard bin/cleanarch script to also replace Date: headers with the date from From_ if they differ by more than mm_cfg.ARCHIVER_ALLOWABLE_SANE_DATE_SKEW (default = 15 days). This may be sufficient. If you run it with the -n option against your mbox, it will report the line #s of the bad dates, what they are and what they would be changed to. For the actual "fix", my inclination is to modify the _set_date method in pipermail.py (this is called from Hyperarch.py as self.__super_set_date(message) just before it does self.fromdate = time.ctime(int(self.date)). I would have this check the date and if it's not within say 50 years of now, replace the date with something reasonable. My question at this point is what's that something reasonable. I think it comes down to a choice between the From_ date if that's reasonable or the current date, but I don't know which is better. Does anyone have an idea? -- Mark Sapiro <m...@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan _______________________________________________ Mailman-Developers mailing list Mailman-Developers@python.org https://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9