Re: [Mailman-Developers] Improving the archives
>In which case [the message body link] would be set to something like. > >http://third-party-service/[EMAIL PROTECTED] Just for fun, I did a trial implementation. It works, but the URLs are too long. For example, the URL below spends 59 characters on the messag-id, and 27 characters on the listname. We're already over my comfort level (of about 72 characters) and haven't even started to count the hostname, and other URL-lengthening overhead. Maybe this was a bad idea after all. http://www.mail-archive.com/search?l=mailman-developers%40python.org&[EMAIL PROTECTED] Jeff ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp
Re: [Mailman-Developers] Improving the archives
>Maybe a way to think about this is that the canonical url is based on >the message-id, but then there's some way to distill even this down >to a tinyurl or simple integer that would be stable in the face of >full archive regenerations. I'd suggest the reverse. Keep the canoncical archive URL short and sweet, and then use a URL redirection service to map message-id's to those URLs. It is the archiver's job to make it all work. For example, the canonical archive URL might stay exactly the way it is in pipermail. But the archival link embedded in the message would instead go to a redirection service. http://mail.codeit.com/pipermail/zcommerce/2002-February/000523.html http://mail.codeit.com/[EMAIL PROTECTED] The one other thing I'd ike to revisit is integration with third party archival services. There are two obvious integration points; one is a button in the Mailman list admin user interface that says "archive with service X" not unlike the setting in Firefox that basically says "search with service X". The other integration point is the archival link discussed above. In which case it would be set to something like. http://third-party-service/[EMAIL PROTECTED] Disclosure: I help run a third party archiving service, and this topic was discussed quite a bit previously. [1] Nonetheless it seems like a good time revisit given the current discussion about archive wishlists. [1] http://www.mail-archive.com/mailman-developers@python.org/msg08772.html ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp
Re: [Mailman-Developers] Improving the archives
I'm all for someone taking ownership of this long-neglected component -- thank you for doing so! Barry Warsaw wrote: > Maybe a way to think about this is that the canonical url is based on > the message-id, but then there's some way to distill even this down > to a tinyurl or simple integer that would be stable in the face of > full archive regenerations. The resistance to basing this on message-id has always been that there's no guarantee of uniqueness... ...but I believe each list has some sort of counter for how many messages it's seen, so we could add another header with that number, and use as a unique id the two concatenated together... (That way the archiver can know from the content of the header exactly how to generate the same unique id as mailman, which would allow for the url-in-the-footer to happen w/o first hitting the archiver.) Just throwing out ideas, -Dale ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp
Re: [Mailman-Developers] Improving the archives
> "st" == Stephen J Turnbull > "Re: [Mailman-Developers] Improving the archives" > Wed, 04 Jul 2007 16:49:58 +0900 st> The main drawback to using Message IDs that I can see is that st> broken MUAs may supply no Message-ID, or the same one st> repeatedly. In the former case, as a last resort Mailman can st> supply one, If the archive is considered to be a reflection of what Mailman _put_ on the wire, as distinct from what was received from the wire, then adding a Message-ID in the absence one already present is a reflection of a SHOULD requirement of rfc(2)822. In the absence of a Message-ID on an outgoing mail message many if not most MTAs will add one. Why not let Mailman anticipate the need to add a Message-ID when archiving the message rather than leaving it to the outgoing MTA? jam pgpQL0SZvNpJX.pgp Description: PGP signature ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp
Re: [Mailman-Developers] Improving the archives
Barry Warsaw writes: > > - archive links that won't break if the archive is rebuilt > > Yes, this is absolutely critical, in fact, I'd put it right at the > top of the list, even more so than a u/i overhaul. Stable urls, with > backward compatible redirecting links if at all possible, would be > fantastic. +1. I've been wanting to do something about this, and have made proposals (not back with code, mea maxima culpa) for design. I would definitely be happy to help with this, but given time constraints, it would be nice if somebody else could take the lead. > Along with that, I would really like to come up with an algorithm for > calculating those urls without talking to the archiver. Brad didn't like this when I suggested it before, but I didn't really understand why not. Anyway, FWIW: I suggest adding an X-List-Received-ID header to all messages. I haven't really thought through whether the UUID in that field should be at least partly human-readable or not, but that doesn't matter for the basic idea.[1] The on-disk directory format would be /path-to-archive/private/my-list/Message-ID for singletons (Message-ID is the author-supplied ID) and /path-to-archive/private/my-list/Message-ID/List-Received-ID for multiples. These would be created on-the-fly when they occur. They can be served as static pages. For almost all messages, the bare URL http://archives.example.com/my-list/Message-ID should Just Work (ie, return a no-such-object result or a single message). Where it does not, you get an index of all pages with that message ID. The main drawback to using Message IDs that I can see is that broken MUAs may supply no Message-ID, or the same one repeatedly. In the former case, as a last resort Mailman can supply one, but that won't help people who get a personal copy and want to find the thread. However, I see no way to help them, anyway, beyond a generic archive search engine. In the latter, you get lots of messages matching the Message-ID, and while most lists should have *zero* problems, a list that has any instances of this problem would have many. Again I can't see a good way to deal with this other than a general search facility, as computing a digest of headers or content is hard to do reliably. Providing an index of matching posts seems like a reasonable approach, which can be efficiently implemented (eg, as static pages). Furthermore, the examples I've seen of both in the last few years have all been either spam or (in the case of duplicate Message-IDs) actual duplicates due to some mail system problem or itchy user fingers. A minor drawback to my proposal is that if a message gets archived as a singleton for that Message-ID, then a duplicate arrives, previously created references in the archive will of course now return an index rather than the desired message. Ie, there is data corruption. This can be dealt with in several ways; the easiest would be to provide a "if-you-got-here-by-clicking-a-ref-from-this-archive-you're-looking-for-me" link when creating the directory for multiple instances. There's also a *very* minor benefit: repeat sends will be immediately recognizable without checking Message-ID. Footnotes: [1] By partly human-readable I mean containing list-id and date information. The idea would be to have the date come first, so that users would have a shot at identifying which of several messages is most likely, and this would be searchable by eye with simply an ordinary sorted index. ___ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp