[JB] The essential contribution of mail-archive.com "is the concept of being able to easily archive email lists by just adding the generic archives address as a subscriber"
That contribution is based on a core "technology" - a system to automatically sort (demultiplex) list email sent to a single address. [AL] Understood. My points remain, in relation to both fixes for bug#1 and bug#2: 1) The conceptual contribution is distinct from the technology implementation. Specifically there is no necessary connection between easily adding archives by just subscribing a generic email address and actually processing ALL the incoming mail for those archives through the same generic address. 2) The above core technology description is "almost" exactly part of what an MTA does. Email from users who CC (multiplex) messages to multiple email lists gets demultiplexed by the Mail Transfer System and routed to the several list servers reliably via sending and routing MTAs, is multiplexed to multiple list subscribers via list server software (also a form of MTA functionality), again routed through the MTS via sending MTAs and routing MTAs to final delivery and demultiplexing into mailboxes by receiving MTAs, at which point User Agents take over and do not need to do any demultiplexing because the mail has already been demultiplexed into a specific mailbox for each UA. 2) Consequently reliable, thoroughly tested production code for processing RFC 822 headers and demultiplexing mail "sent to a single address" into appropriate local mailboxes is a core component of existing freely available MTA software. Essentially that is what a receiving MTA "does for a living". 3) I use the word "almost" in 1) because the "single address" of 2) is normally the MTAs SMTP IP address and port number. This however is NOT a critical part of the demultiplexing functionality of receiving MTAs. MTAs also act as a gateway to non-SMTP mail systems and receive incoming messages for delivery, including demultiplexing, via local transmission, UUCP mail queues, and many other systems (including decryption of messages received at a local mailbox for anonymous remailing). The demultiplexing functionality is based entirely on parsing of the RFC 822 header, which is exactly what you are doing. The routing and gatewaying of mail for foreign email systems can use complex header re-writing rules which is also what you are doing when converting an email list's address into a local mailbox for MHonArc. 4) Therefore I suggested re bug#1 grabbing the existing code for this purpose from an MTA package and feeding incoming RFC 822 messages arriving at the generic subscription mailbox to that MTA code as though they had arrived directly to the MTA (whether or not you re-design to use the MTA software more generally as below). Then your customization would create new mailboxes for "undeliverable" mail corresponding to new mailing lists. 5) When the delivery address(es) of an incoming message corresponds to a local user mailbox, existing MTA software efficiently delivers it into that mailbox. By re-designing to automatically re-subscribe each list to an independent address AFTER it has been initially subscribed to the generic address (and then, or at the same time, cancel the generic address subscription), incoming mail that is NOT for the generic subscription address (i.e. more than 99% of mail) would be processed ENTIRELY by normal MTA software. [JB] I can forsee similar technology being used in other applications than mail-archive.com. For example, it might be used by email user agents, both local and web based, to do automatic sorting on behalf of a single human user. There may be other, as yet unforseen uses. [AL] I'm not familiar with it but my understanding is that this is the sort of thing procmail does (inefficiently). If you only have to do it for the 1 or few messages received at the generic subscription address before the subscription is changed over to the unique individual address per list, then however inefficient procmail is, that would not matter. More than 99% of incoming mail would not pass through the inefficient process anyway, but would still get batched up nicely for MHonArc by the MTA software. [JB] Thus, I want to polish the automatic sorting algorithm if I can, since it is the key technology. That's why I'm willing to spend time tackling the automatic sorting bugs. [AL] Well it's your time. But I disagree that this is the "core technology". What I saw was the core technology being creation of new mailboxes for messages to newly subscribed lists received at the generic mailbox and configuring MHonArc, HtDig and a web index page to include them. The fact that you were ALSO demultiplexing every message through your generic mailbox with the aid of MH instead of changing the subscription address for each new mailing list struck me as an implementation decision, rather than "core technology". I think you mentioned in email that you were using MH and shell scripts because that happens to be what you are familiar with and not using procmail because it is too inefficient processing large numbers of messages one at a time. Fair enough - it got it working and the best way to get anything working is with whatever tools are at hand. But that isn't necessary the best way to make it scalable or capable of generating less than 1 trouble log error per week. I would guess that figuring out the configuration details for MHonArc and HtDig would have taken as much effort as your own script development, and figuring out the details for installing and configuring an MTA package (especially for not running it as root and receiving input from a mailbox), or even for configuring procmail, would have taken longer than the few extra lines of shell script required to implement the demultiplexing the way you did. If so, that would make the present approach the optimum choice for initial implementation (especially as changing the subscription addresses of lists automatically will be non-trivial - though also required eventually for other reasons including hardening against denial of service attacks). But it doesn't make continued maintenance of that the optimum long term solution. I won't argue any more about that as it isn't my problem, so please don't feel obliged to waste time replying if you still don't agree. But I have re-stated my view above in case it was not clear enough. [JB] Jeff PS Happy holidays. [AL] Bah! Humbug! ;-)
