Re: [Dbmail-dev] messageblks logic
It simply fills up a block of size READ_BLOCK_SIZE, inserts it, then start filling another one. I'm not familiar with the output code (tried reading through it, but quickly got confused by the details of the MIME parser). My understanding, though, is that the entire message has to be reassembled in memory at some point for the MIME parsing to work. Ilja, is this correct? If GMIME has a callback architecture, we might be able to ask it to parse messges in blocks of READ_BLOCK_SIZE, so we'd be able to retrieve rows from the database one at a time and pass it on to GMIME. I'm not sure what the best way to handle this is, though. My thinking has always revolved around handling huge messages without causing resource starvation. So a gigabyte email should be parsed in pieces and not all allocated into memory at once. But a four megabyte email might as well go into memory. You'd have to get a heck of a lot of people each reading a four meg email at once for that to be a major problem. But, since we want DBMail to be properly scalable to really large installations, it is a distinct possibility to have that many people each reading an email that large (scenario: the CEO sounds out his latest crazy plan in a four meg powerpoint, and everyone in the whole company starts pounding on the mail server to retrieve their copy.) Aaron Paul J Stevens [EMAIL PROTECTED] said: Hi all, Ilja, Aaron, I'm playing around with gmime to see if I can rebuild the message injection and extraction logic around glib/gmime. What I'd like to know: what would be the logic for splitting a message into messageblks? I know the first blk is for the email messageheader. Easy. But after that? What criteria are used? Line split? fix string sizes? Mimepart boundaries? I have a hard time understanding the current codebase. And are the current criteria implicitely required elsewhere in the code... I guess imap-searching requires splitting on line-boundaries at the minimum, or maybe word-boundaries. But other than that? -- Paul Stevens mailto:[EMAIL PROTECTED] NET FACILITIES GROUP PGP: finger [EMAIL PROTECTED] The Netherlandshttp://www.nfg.nl ___ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://twister.fastxs.net/mailman/listinfo/dbmail-dev --
Re: [Dbmail-dev] messageblks logic
Aaron Stone wrote: It simply fills up a block of size READ_BLOCK_SIZE, inserts it, then start filling another one. Doesn't that break searching messages? Breaking up messages on a fixed char width is easiest of all, but then single words in messages could span messageblks. But then, READ_BLOCK_SIZE is .5 MB, and mails larger than that tend to be mime-encoded anyway. snip If GMIME has a callback architecture, we might be able to ask it to parse messges in blocks of READ_BLOCK_SIZE, so we'd be able to retrieve rows from the database one at a time and pass it on to GMIME. Such callbacks probably can be only be implemented if messageblks are logical mime units, such as a full message or a mime-part. Or at the very least such logical mime parts would have to be reassembled before initializing a gmime object. I'm not sure what the best way to handle this is, though. My thinking has always revolved around handling huge messages without causing resource starvation. So a gigabyte email should be parsed in pieces and not all allocated into memory at once. GB sized emails seem to me to be not-of-this-world at present. I'm quite certain most if not all isp have a cap on the max mailmessage size that's quite a lot smaller than that. Still, a valid guideline though. But a four megabyte email might as well go into memory. You'd have to get a heck of a lot of people each reading a four meg email at once for that to be a major problem. But, since we want DBMail to be properly scalable to really large installations, it is a distinct possibility to have that many people each reading an email that large (scenario: the CEO sounds out his latest crazy plan in a four meg powerpoint, and everyone in the whole company starts pounding on the mail server to retrieve their copy.) I guess only a real-world test can expose the actual bottlenecks involved. You got me thinking ... -- Paul Stevens mailto:[EMAIL PROTECTED] NET FACILITIES GROUP PGP: finger [EMAIL PROTECTED] The Netherlandshttp://www.nfg.nl
Re: [Dbmail-dev] messageblks logic
Ilja Booij wrote: I'm not sure what the best way to handle this is, though. My thinking has always revolved around handling huge messages without causing resource starvation. So a gigabyte email should be parsed in pieces and not all allocated into memory at once. But a four megabyte email might as well go into memory. You'd have to get a heck of a lot of people each reading a four meg email at once for that to be a major problem. But, since we want DBMail to be properly scalable to really large installations, it is a distinct possibility to have that many people each reading an email that large (scenario: the CEO sounds out his latest crazy plan in a four meg powerpoint, and everyone in the whole company starts pounding on the mail server to retrieve their copy.) OTOH, a message which consists of multiple block will (almost) always be fetched from the database completely anyway. With that in mind, it wouldn't matter if the message data is in 2 blocks (header and body) instead of 2 blocks. I can remember something about the maximum size for a TEXT field in MySQL being the original reason for the choice of splitting the message into parts. I'll ask Eelco and Roel, they should know this. I've asked Eelco about this: MySQL used to have a limit on the client-server communication that forced us to limit the size of blocks being transferred. Nowadays that limit is much higher. Setting the max_allowed_packet variable to a high number (a few MB for instance) on both client and server will allow for sending big blocks between client and server. The TEXT field itself has no limit. In PostgreSQL, there's a 1GB limit on the TEXT field. I think we can safely go to a strategy where we put the message in 2 blocks, 1 for the header and 1 for the body. However, if we make our parsing code to handle messages only in that way, we need to produce a script for migrating from a database with split messages. This should'n be too much of a problem though. Ilja
Re: [Dbmail-dev] messageblks logic
I don't think we can safely cap message sizes at a few mb -- I've certainly sent myself some hefty sized emails and would be quite frustrated if this weren't a possibility because of an intrinsic, hard-coded limit. Yes, it makes the code more complicated, but I think it's only because we still haven't fully adapted our thinking to our model. It occurred to me today that if we were MIME parsing at delivery time, we could also store the mime structure of a message in the database. An IMAP BODY.PEEK for mime structures would be nearly instantaneous. This is probably the single most common request from an email client, especially web-based ones that need to refresh their message list on almost every page hit. If MySQL can retrieve huge (eg 1 GB) rows easily, and the limitation is only when inserting or updating, then we might be able to use string concatenation to append parts of the message until the body row has the whole thing. Aaron Ilja Booij [EMAIL PROTECTED] said: MySQL used to have a limit on the client-server communication that forced us to limit the size of blocks being transferred. Nowadays that limit is much higher. Setting the max_allowed_packet variable to a high number (a few MB for instance) on both client and server will allow for sending big blocks between client and server. The TEXT field itself has no limit. In PostgreSQL, there's a 1GB limit on the TEXT field. I think we can safely go to a strategy where we put the message in 2 blocks, 1 for the header and 1 for the body. However, if we make our parsing code to handle messages only in that way, we need to produce a script for migrating from a database with split messages. This should'n be too much of a problem though. Ilja ___ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://twister.fastxs.net/mailman/listinfo/dbmail-dev --
Re: [Dbmail-dev] messageblks logic
Aaron Stone wrote: I don't think we can safely cap message sizes at a few mb -- I've certainly sent myself some hefty sized emails and would be quite frustrated if this weren't a possibility because of an intrinsic, hard-coded limit. I wasn't suggesting capping at a few MB, but rather at something like 128MB. Yes, it makes the code more complicated, but I think it's only because we still haven't fully adapted our thinking to our model. It occurred to me today that if we were MIME parsing at delivery time, we could also store the mime structure of a message in the database. An IMAP BODY.PEEK for mime structures would be nearly instantaneous. This is probably the single most common request from an email client, especially web-based ones that need to refresh their message list on almost every page hit. This can be very interesting to do. Of course, we still need to store the message in its original format, but to store some information on mime-parts would be very benificial for performance. If MySQL can retrieve huge (eg 1 GB) rows easily, and the limitation is only when inserting or updating, then we might be able to use string concatenation to append parts of the message until the body row has the whole thing. Setting the max_allowed_packet high enough (128MB, maybe even 256MB) would make this unnecessary. Ilja
Re: [Dbmail-dev] messageblks logic
Yes, it makes the code more complicated, but I think it's only because we still haven't fully adapted our thinking to our model. It occurred to me today that if we were MIME parsing at delivery time, we could also store the mime structure of a message in the database. An IMAP BODY.PEEK for mime structures would be nearly instantaneous. This is probably the single most common request from an email client, especially web-based ones that need to refresh their message list on almost every page hit. This can be very interesting to do. Of course, we still need to store the message in its original format, but to store some information on mime-parts would be very benificial for performance. This would also lend to the header caching, or at least let me finish up a sort and thread implementation that would be very fast. just my 0.02 -leif
Re: [Dbmail-dev] messageblks logic
I think we can safely go to a strategy where we put the message in 2 blocks, 1 for the header and 1 for the body. However, if we make our parsing code to handle messages only in that way, we need to produce a script for migrating from a database with split messages. This should'n be too much of a problem though. Can a header flag be added to the message blocks table at this time? -- Jesse Norell [EMAIL PROTECTED] is not my email address; change administrator to my first name. --
Re: [Dbmail-dev] message delivery failure messages when over quotum
I started working on a version of sort_and_deliver that takes a dsnuser struct as an argument, and moved one of the loops from pipe.c into sort.c This gives sort_and_delivery an understanding that there may be several actual deliveries associated with a particular address. It's just a sketch right now, and doesn't yet solve any problems (it's just structured better, imho). The problem always boils down to this, though: how do we assemble the different codes returned by each of the mini-deliveries into a single code to return for the address itself? (note that this is a separate, though related, problem to the SMTP issue of finding one code for multiple addresses -- IMHO, we should be recommending to people to force the MTA to go one at a time when using dbmail-smtp). Aaron Ilja Booij [EMAIL PROTECTED] said: Hi, I've been looking at the code in lmtp.c, pipe.c, sort/sort.c and dsn.c to see if we can do a bit better when a message can't be inserted because of a user being over quotum. We *should* return an 552 message to the MTA, but currently we return a 450. What to do? We use db_copymsg() to copy the temporary message to the user's mailbox. db_copymsg signals the calling function when it fails because of an 'over-quotum-error'. However, we then return a DSN_CLASS_TEMP anyway. We should improve the code to be able report a 552 to the MTA so it can send the appropriate bounce message. Using LMTP, this should be quite easy, although we stil have a problem when an alias expands to two users, one of which is over-quotum. In this last case, I'd opt for sending the 552. Using dbmail-smtp: I wonder what information we can send back to the MTA and how we can do this (in what format). Ilja ___ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://twister.fastxs.net/mailman/listinfo/dbmail-dev --
Re: [Dbmail-dev] messageblks logic
I agree. We should fill in the wiki some more and get a plan together. Aaron Jesse Norell [EMAIL PROTECTED] said: It occurred to me today that if we were MIME parsing at delivery time, we could also store the mime structure of a message in the database. An IMAP BODY.PEEK for mime structures would be nearly instantaneous. This is probably the single most common request from an email client, especially web-based ones that need to refresh their message list on almost every page hit. This can be very interesting to do. Of course, we still need to store the message in its original format, but to store some information on mime-parts would be very benificial for performance. This would/could be another application for making a more generic per-message data cache, rather than solely for message headers. -- Jesse Norell [EMAIL PROTECTED] is not my email address; change administrator to my first name. -- ___ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://twister.fastxs.net/mailman/listinfo/dbmail-dev --