[JB]
The essential
contribution of mail-archive.com "is the concept of being able to
easily archive email lists by just adding the generic archives address
as a subscriber"

That contribution is based on a core "technology" - a system to
automatically sort (demultiplex) list email sent to a single
address. 

[AL]
Understood. My points remain, in relation to both fixes for bug#1 and
bug#2:

1) The conceptual contribution is distinct from the technology
implementation. Specifically there is no necessary connection between
easily adding archives by just subscribing a generic email address and
actually processing ALL the incoming mail for those archives through the
same generic address.

2) The above core technology description is "almost" exactly part of
what an MTA does. Email from users who CC (multiplex) messages to
multiple email lists gets demultiplexed by the Mail Transfer System and
routed to the several list servers reliably via sending and routing
MTAs, is multiplexed to multiple list subscribers via list server
software (also a form of MTA functionality), again routed through the
MTS via sending MTAs and routing MTAs to final delivery and
demultiplexing into mailboxes by receiving MTAs, at which point User
Agents take over and do not need to do any demultiplexing because the
mail has already been demultiplexed into a specific mailbox for each UA.

2) Consequently reliable, thoroughly tested production code for
processing RFC 822 headers and demultiplexing mail "sent to a single
address" into appropriate local mailboxes is a core component of
existing freely available MTA software. Essentially that is what a
receiving MTA "does for a living".

3) I use the word "almost" in 1) because the "single address" of 2) is
normally the MTAs SMTP IP address and port number. This however is NOT a
critical part of the demultiplexing functionality of receiving MTAs.
MTAs also act as a gateway to non-SMTP mail systems and receive incoming
messages for delivery, including demultiplexing, via local transmission,
UUCP mail queues, and many other systems (including decryption of
messages received at a local mailbox for anonymous remailing). The
demultiplexing functionality is based entirely on parsing of the RFC 822
header, which is exactly what you are doing. The routing and gatewaying
of mail for foreign email systems can use complex header re-writing
rules which is also what you are doing when converting an email list's
address into a local mailbox for MHonArc.

4) Therefore I suggested re bug#1 grabbing the existing code for this
purpose from an MTA package and feeding incoming RFC 822 messages
arriving at the generic subscription mailbox to that MTA code as though
they had arrived directly to the MTA (whether or not you re-design to
use the MTA software more generally as below). Then your customization
would create new mailboxes for "undeliverable" mail corresponding to new
mailing lists.

5) When the delivery address(es) of an incoming message corresponds to a
local user mailbox, existing MTA software efficiently delivers it into
that mailbox. By re-designing to automatically re-subscribe each list to
an independent address AFTER it has been initially subscribed to the
generic address (and then, or at the same time, cancel the generic
address subscription), incoming mail that is NOT for the generic
subscription address (i.e. more than 99% of mail) would be processed
ENTIRELY by normal MTA software.

[JB]
I can forsee similar technology being used in other
applications than mail-archive.com.  For example, it might be used by
email user agents, both local and web based, to do automatic sorting
on behalf of a single human user. There may be other, as yet unforseen
uses.

[AL]
I'm not familiar with it but my understanding is that this is the sort
of thing procmail does (inefficiently). If you only have to do it for
the 1 or few messages received at the generic subscription address
before the subscription is changed over to the unique individual address
per list, then however inefficient procmail is, that would not matter.
More than 99% of incoming mail would not pass through the inefficient
process anyway, but would still get batched up nicely for MHonArc by the
MTA software.

[JB]
Thus, I want to polish the automatic sorting algorithm if I can, since
it is the key technology. That's why I'm willing to spend time
tackling the automatic sorting bugs.

[AL]
Well it's your time. But I disagree that this is the "core technology".
What I saw was the core technology being creation of new mailboxes for
messages to newly subscribed lists received at the generic mailbox and
configuring MHonArc, HtDig and a web index page to include them.

The fact that you were ALSO demultiplexing every message through your
generic mailbox with the aid of MH instead of changing the subscription
address for each new mailing list struck me as an implementation
decision, rather than "core technology". I think you mentioned in email
that you were using MH and shell scripts because that happens to be what
you are familiar with and not using procmail because it is too
inefficient processing large numbers of messages one at a time. Fair
enough - it got it working and the best way to get anything working is
with whatever tools are at hand. But that isn't necessary the best way
to make it scalable or capable of generating less than 1 trouble log
error per week.

I would guess that figuring out the configuration details for MHonArc
and HtDig would have taken as much effort as your own script
development, and figuring out the details for installing and configuring
an MTA package (especially for not running it as root and receiving
input from a mailbox), or even for configuring procmail, would have
taken longer than the few extra lines of shell script required to
implement the demultiplexing the way you did.

If so, that would make the present approach the optimum choice for
initial implementation (especially as changing the subscription
addresses of lists automatically will be non-trivial - though also
required eventually for other reasons including hardening against denial
of service attacks).

But it doesn't make continued maintenance of that the optimum long term
solution. I won't argue any more about that as it isn't my problem, so
please don't feel obliged to waste time replying if you still don't
agree. But I have re-stated my view above in case it was not clear
enough.

[JB]
Jeff

PS Happy holidays.

[AL]
Bah! Humbug! ;-)

Reply via email to