Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Jeff Breidenbach
Notice that of 325146 total messages, 624 of them had no message-id header. Even if you aggregate dup+col, you're still looking at a total duplicate rate of 0.29%. Message ID's are supposed to be unique. This is discussed in in RFC 822: 4.6.1 and RFC 1036: 2.1.5, and probably other places. If

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Stephen J. Turnbull
Jeff Breidenbach writes: Notice that of 325146 total messages, 624 of them had no message-id header. Even if you aggregate dup+col, you're still looking at a total duplicate rate of 0.29%. Message ID's are supposed to be unique. Fortunately, a rule more honored in the observance

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread John A. Martin
st == Stephen J Turnbull Re: [Mailman-Developers] Improving the archives Tue, 24 Jul 2007 15:56:35 +0900 st Jeff Breidenbach writes: Notice that of 325146 total messages, 624 of them had no message-id header. Even if you aggregate dup+col, you're still looking at a

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Stephen J. Turnbull
John A. Martin writes: better to go ahead and use the mesage-id, rather than concoct yet another this time we mean it! unique identifier. st That's not the point. We're not going to impose this on st senders; I read the quote as meaning this time we mean it

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Jeff Breidenbach
There are three different parties coming to the table. One is the mail transfer agent of the sender, another is the list server, and the third is the archive server. Ideally, all three will be happy campers. So we just specify a header to put it in, and subscribers will be able to use it, per

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Dale Newfield
Jeff Breidenbach wrote: In addition, Barry was talking about concocting a unique identifier from the Date field and Message-ID. I'm not a big fan of this idea, because the date field comes from the mail user agent and is often wildly corrupt; e;g; coming from 100 years in the future. Oh--I

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Terri Oda
On 24-Jul-07, at 12:31 PM, Jeff Breidenbach wrote: So we just specify a header to put it in, and subscribers will be able to use it, per definition of a canonical URL. It is the archive server's job to decide what is the canonical URL for a message. There's a good chance these archival URLs

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Jeff Breidenbach
Regardless of whether we *need* to generate our own unique ID, I'm leaning towards the thought that we're going to *want* to generate our own for usability reasons. In a perfect world, i think we'd have a sequence number so I could visit http://example.com/mailman/ archives/listname/204.html

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Barry Warsaw
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 22, 2007, at 12:33 PM, Terri Oda wrote: On 20-Jul-07, at 8:39 AM, Barry Warsaw wrote: I've looked at a few lurker archivers and I wasn't blown away by its user interface. That's apparently highly configurable though. I've been doing a

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Barry Warsaw
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 24, 2007, at 2:02 AM, Jeff Breidenbach wrote: Which brings me to suggestion #2, which is go ahead and write an RFC on how list servers should embed archival links in messages. This sounds like an internet wide interoperability issue as much

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Barry Warsaw
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 24, 2007, at 2:56 AM, Stephen J. Turnbull wrote: I simply think we should be prepared for applications where relying on the sender to supply a UUID is not acceptable; we need to be able to provide one ourselves. Creating UUIDs is a solved

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Barry Warsaw
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Jul 24, 2007, at 12:31 PM, Jeff Breidenbach wrote: What complexity? Mailman just does msg['X-List-Archive-Received-ID'] = Email.msgid() Easy to introduce, harder to deal with. The archival server would now keep track of both the message-id

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Stephen J. Turnbull
Jeff Breidenbach writes: So we just specify a header to put it in, and subscribers will be able to use it, per definition of a canonical URL. It is the archive server's job to decide what is the canonical URL for a message. There's a good chance these archival URLs will be served by

Re: [Mailman-Developers] Improving the archives

2007-07-24 Thread Jeff Breidenbach
What you gain from my proposal over a pure Message-ID approach is guaranteed uniqueness given the list copy Guarantee is a pretty strong word. A malicious person could post two messages with the same message-id, same date, but different bodies. Sometimes the channel between the MLM and the