Problem: - People leaves mail in the mailbox. Scanning the mailbox every time is a I/O hungry operation
- Rewritting a partially updated mailbox is very expensive. UIDL update, partial mailbox deleting, mail arrives while the popper is running... Solution: A simple and efficient database (key/value) used to store messages. For example, BerkeleyDB (http://www.sleepycat.com/) Qpopper would have six operations: - Translate estándar mailboxes into the database. - Serve mails from database. - An additional tool to show statistics about users: messages in database, lenght, last login, quota... - An additional tool to list and delete a concrete user message. - An additional tool to delete an user and all its messages. - An additional tool to kill all popper processes, disable POP3 logins and reconstruct the database if it's neccesary. This operation, tipically, lasts 4-5 seconds. We could have have another tool to delete messages already read and older that a month, for example. Example: You could have a central mailbox database. Every email in the database would have a unique UID. Every message resides in two register, for example. One register contains the message body. The other register has the message headers, which can be modified by qpopper (UIDL, Status, etc). There are per user registers to keep data like messages UID, messages length, quota, last login, perhaps UIDL and Status. There is a global register that keep a global serial number (used as a UID generator), atomically updated every time a message is added to the database. When an user enters POP3, qpopper would translate new messages in user standard mailbox into the database (erasing the original mailbox). Then, the messages are served from the database. The message migration can be implemented, also, with a cron job to migrate mailboxes with infrequent logins. The unique remaining problem would be "quotas", a very problematic issue for current qpopper also. If you control the local mailer you can talk to the database and control quotas there. Advantages: - You don't need scan anything when you have the messages in the database. You know, everytime, how many messages an user has, lenght, and so on. If new email arrives, you migrate it to the database. - You can delete individual messages without needing a mailbox rewriting. - You can modify headers without expensive I/O, since headers (tipically <2Kbytes) are kept separated from message bodies. - New messages arriving while qpopper is working don't require mailbox rewriting. - Berkeley DB, for example, can retrieves partial registers. That is, you can have a 15 MB message, and you don't need to read it in a shot. In fact, you can read the message in 64 Kbytes chunks, for example, to keep memory and I/O small. - Berkeley DB overhead in disk space and CPU is fairly small. - Berkeley DB implements atomic transactions. In fact, you have full ACID semantic. A popper processs can die any time and the database is always consistent. - Berkeley DB detects and resolve deadlocks when multiple processes access the database. - Berkeley DB is free for non commercial usages. - Last Berkeley DB version supports replication. - You can support multiple mailboxes format: mailbox and maildir, for example. The unique impact would be to program the mailbox to database converter. This step if fairly simple. PS: I'm advocating Berkeley DB because I'm using the system for years in big (millions of registers) and critical environments, and its performance and safety are stunning. But any similar DB will do the work. Observe that I'm not talking about SQL database. That's not the way. I'm talking about fully ACID semantic key/value databases. -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ [EMAIL PROTECTED] http://www.argo.es/~jcea/ _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ _/_/_/_/_/ PGP Key Available at KeyServ _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz