Re: [Mailman-Developers] Improving the archives

2007-07-04 Thread Jeff Breidenbach
>In which case [the message body link] would be set to something like.
>
>http://third-party-service/[EMAIL PROTECTED]

Just for fun, I did a trial implementation. It works, but the URLs are
too long.
For example, the URL below spends 59 characters on the messag-id, and
27 characters on the listname. We're  already over my comfort level (of
about 72 characters) and haven't even started to count the hostname, and
other URL-lengthening overhead. Maybe this was a bad idea after all.

http://www.mail-archive.com/search?l=mailman-developers%40python.org&[EMAIL 
PROTECTED]

Jeff
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp


Re: [Mailman-Developers] Improving the archives

2007-07-04 Thread Jeff Breidenbach
>Maybe a way to think about this is that the canonical url is based on
>the message-id, but then there's some way to distill even this down
>to a tinyurl or simple integer that would be stable in the face of
>full archive regenerations.

I'd suggest the reverse. Keep the canoncical archive URL short and
sweet, and then use a URL redirection service to map message-id's
to those URLs. It is the archiver's job to make it all work. For example,
the canonical  archive URL might stay exactly the way it is in pipermail.
But the archival link embedded in the message would instead go
to a redirection service.

http://mail.codeit.com/pipermail/zcommerce/2002-February/000523.html
http://mail.codeit.com/[EMAIL PROTECTED]

The one other thing I'd ike to revisit is integration with third party
archival services. There are two obvious integration points; one is a
button in the Mailman list admin user interface that says "archive with
service X" not unlike the setting in Firefox that basically says "search
with service X". The other integration point is the archival link
discussed above. In which case it would be set to something like.

http://third-party-service/[EMAIL PROTECTED]

Disclosure: I help run a third party archiving service, and this topic was
discussed quite a bit previously.  [1] Nonetheless it seems like a good
time revisit given the current discussion about archive wishlists.

[1] http://www.mail-archive.com/mailman-developers@python.org/msg08772.html
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp


Re: [Mailman-Developers] Improving the archives

2007-07-04 Thread Dale Newfield
I'm all for someone taking ownership of this long-neglected component -- 
thank you for doing so!

Barry Warsaw wrote:
> Maybe a way to think about this is that the canonical url is based on  
> the message-id, but then there's some way to distill even this down  
> to a tinyurl or simple integer that would be stable in the face of  
> full archive regenerations.

The resistance to basing this on message-id has always been that there's 
no guarantee of uniqueness...
...but I believe each list has some sort of counter for how many 
messages it's seen, so we could add another header with that number, and 
use as a unique id the two concatenated together...
(That way the archiver can know from the content of the header exactly 
how to generate the same unique id as mailman, which would allow for the 
url-in-the-footer to happen w/o first hitting the archiver.)

Just throwing out ideas,
-Dale
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp


Re: [Mailman-Developers] Improving the archives

2007-07-04 Thread John A. Martin
> "st" == Stephen J Turnbull
> "Re: [Mailman-Developers] Improving the archives"
>  Wed, 04 Jul 2007 16:49:58 +0900

st> The main drawback to using Message IDs that I can see is that
st> broken MUAs may supply no Message-ID, or the same one
st> repeatedly.  In the former case, as a last resort Mailman can
st> supply one,

If the archive is considered to be a reflection of what Mailman _put_
on the wire, as distinct from what was received from the wire, then
adding a Message-ID in the absence one already present is a reflection
of a SHOULD requirement of rfc(2)822.  In the absence of a Message-ID
on an outgoing mail message many if not most MTAs will add one.  Why
not let Mailman anticipate the need to add a Message-ID when archiving
the message rather than leaving it to the outgoing MTA?

jam


pgpQL0SZvNpJX.pgp
Description: PGP signature
___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp

Re: [Mailman-Developers] Improving the archives

2007-07-04 Thread Stephen J. Turnbull
Barry Warsaw writes:
 > > - archive links that won't break if the archive is rebuilt
 > 
 > Yes, this is absolutely critical, in fact, I'd put it right at the  
 > top of the list, even more so than a u/i overhaul.  Stable urls, with  
 > backward compatible redirecting links if at all possible, would be  
 > fantastic.

+1.  I've been wanting to do something about this, and have made
proposals (not back with code, mea maxima culpa) for design.  I would
definitely be happy to help with this, but given time constraints, it
would be nice if somebody else could take the lead.

 > Along with that, I would really like to come up with an algorithm for  
 > calculating those urls without talking to the archiver.

Brad didn't like this when I suggested it before, but I didn't really
understand why not.  Anyway, FWIW:

I suggest adding an X-List-Received-ID header to all messages.  I
haven't really thought through whether the UUID in that field should
be at least partly human-readable or not, but that doesn't matter for
the basic idea.[1]  The on-disk directory format would be

/path-to-archive/private/my-list/Message-ID

for singletons (Message-ID is the author-supplied ID) and

/path-to-archive/private/my-list/Message-ID/List-Received-ID

for multiples.  These would be created on-the-fly when they occur.
They can be served as static pages.  For almost all messages, the bare
URL

http://archives.example.com/my-list/Message-ID

should Just Work (ie, return a no-such-object result or a single
message).  Where it does not, you get an index of all pages with that
message ID.

The main drawback to using Message IDs that I can see is that broken
MUAs may supply no Message-ID, or the same one repeatedly.  In the
former case, as a last resort Mailman can supply one, but that won't
help people who get a personal copy and want to find the thread.
However, I see no way to help them, anyway, beyond a generic archive
search engine.  In the latter, you get lots of messages matching the
Message-ID, and while most lists should have *zero* problems, a list
that has any instances of this problem would have many.  Again I can't
see a good way to deal with this other than a general search facility,
as computing a digest of headers or content is hard to do reliably.
Providing an index of matching posts seems like a reasonable approach,
which can be efficiently implemented (eg, as static pages).
Furthermore, the examples I've seen of both in the last few years have
all been either spam or (in the case of duplicate Message-IDs) actual
duplicates due to some mail system problem or itchy user fingers.

A minor drawback to my proposal is that if a message gets archived as
a singleton for that Message-ID, then a duplicate arrives, previously
created references in the archive will of course now return an index
rather than the desired message.  Ie, there is data corruption.  This
can be dealt with in several ways; the easiest would be to provide a
"if-you-got-here-by-clicking-a-ref-from-this-archive-you're-looking-for-me"
link when creating the directory for multiple instances.

There's also a *very* minor benefit: repeat sends will be immediately
recognizable without checking Message-ID.

Footnotes: 
[1]  By partly human-readable I mean containing list-id and date
information.  The idea would be to have the date come first, so that
users would have a shot at identifying which of several messages is
most likely, and this would be searchable by eye with simply an
ordinary sorted index.

___
Mailman-Developers mailing list
Mailman-Developers@python.org
http://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: 
http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp