A couple quick practical notes: 1) Terri is exactly right. The reason for including list identity as part of the hash calculation is for cross-posted messages. An archiving service shows context. Here's the message AND the thread it fits into, AND information about the list it travelled over AND the ability to search that list further. Archives need to know the list to provide context.
2) The reason mail-archive.com uses List-Post and not List-Id in the calculation is because every list, RFC2369 compliant or not, has a concept of a posting address. It is natural idea, easy to think of and understand. Hence all mail-archive.com archives are keyed off of posting address. It would be technical possible (but an architectural pain) for mail-archive.com to calculate using List-Id. We'd probably not bother and instead store whatever was calculated by mailman and placed in the Archived-At: header. Okay, I'll admit my prejudice. I've always found List-Id annoying, and wish that it didn't exist. 3) As long as things are changing, I want to mention that these URLs feel too long. SHA-1 is a 160 bit hash consuming 32 URL characters. I think trimming to a 64 bit (13 character) hash is plenty. According to wikipedia collision tables, with the shorter hash we'd expect to get our first collision after archiving 5 billion messages. That's 50X the current corpus size of public archival services like GMane. And it isn't like an occasional hash collision is a big deal or a security problem. http://en.wikipedia.org/wiki/Birthday_attack 3b) For that matter, a sequence number would also do the trick, but I can understand that this is much more dangerous; it is easy for a sequence number to get reset and cause all hell to break loose. 4) I'm really not that picky. Our archival service could deal with all sorts of URLs, including the ones Terri was trying to avoid, such as http://example.com/archiver/listname.example.com/$hash In fact, we've found that lots of small, per-list databases have speed and reliability advantages over big global databases. But I also like short URLs. Bottom line, please don't let these comments delay or derail forward progress. -Jeff _______________________________________________ Mailman-Developers mailing list [email protected] http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://wiki.list.org/x/AgA3 Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://wiki.list.org/x/QIA9
