Re: [Dbmail-dev] messageblks logic

2004-05-24 Thread Aaron Stone
It simply fills up a block of size READ_BLOCK_SIZE, inserts it, then start
filling another one. I'm not familiar with the output code (tried reading
through it, but quickly got confused by the details of the MIME parser). My
understanding, though, is that the entire message has to be reassembled in
memory at some point for the MIME parsing to work. Ilja, is this correct?

If GMIME has a callback architecture, we might be able to ask it to parse
messges in blocks of READ_BLOCK_SIZE, so we'd be able to retrieve rows from
the database one at a time and pass it on to GMIME.

I'm not sure what the best way to handle this is, though. My thinking has
always revolved around handling huge messages without causing resource
starvation. So a gigabyte email should be parsed in pieces and not all
allocated into memory at once. But a four megabyte email might as well go into
memory. You'd have to get a heck of a lot of people each reading a four meg
email at once for that to be a major problem. But, since we want DBMail to be
properly scalable to really large installations, it is a distinct possibility
to have that many people each reading an email that large (scenario: the CEO
sounds out his latest crazy plan in a four meg powerpoint, and everyone in the
whole company starts pounding on the mail server to retrieve their copy.)

Aaron


Paul J Stevens [EMAIL PROTECTED] said:

 Hi all,
 
 Ilja, Aaron,
 
 I'm playing around with gmime to see if I can rebuild the message 
 injection and extraction logic around glib/gmime.
 
 What I'd like to know: what would be the logic for splitting a message 
 into messageblks?
 
 I know the first blk is for the email messageheader. Easy. But after 
 that? What criteria are used? Line split? fix string sizes? Mimepart 
 boundaries?  I have a hard time understanding the current codebase.
 
 And are the current criteria implicitely required elsewhere in the code...
 
 I guess imap-searching requires splitting on line-boundaries at the 
 minimum, or maybe word-boundaries.
 
 But other than that?
 
 
 -- 

Paul Stevens  mailto:[EMAIL PROTECTED]
NET FACILITIES GROUP PGP: finger [EMAIL PROTECTED]
The Netherlandshttp://www.nfg.nl
 ___
 Dbmail-dev mailing list
 Dbmail-dev@dbmail.org
 http://twister.fastxs.net/mailman/listinfo/dbmail-dev
 



-- 





Re: [Dbmail-dev] messageblks logic

2004-05-24 Thread Paul J Stevens



Aaron Stone wrote:

It simply fills up a block of size READ_BLOCK_SIZE, inserts it, then start
filling another one.


Doesn't that break searching messages? Breaking up messages on a fixed 
char width is easiest of all, but then single words in messages could 
span messageblks. But then, READ_BLOCK_SIZE is .5 MB, and mails larger 
than that tend to be mime-encoded anyway.


snip


If GMIME has a callback architecture, we might be able to ask it to parse
messges in blocks of READ_BLOCK_SIZE, so we'd be able to retrieve rows from
the database one at a time and pass it on to GMIME.


Such callbacks probably can be only be implemented if messageblks are 
logical mime units, such as a full message or a mime-part. Or at the 
very least such logical mime parts would have to be reassembled before 
initializing a gmime object.



I'm not sure what the best way to handle this is, though. My thinking has
always revolved around handling huge messages without causing resource
starvation. So a gigabyte email should be parsed in pieces and not all
allocated into memory at once. 


GB sized emails seem to me to be not-of-this-world at present. I'm quite 
certain most if not all isp have a cap on the max mailmessage size 
that's quite a lot smaller than that. Still, a valid guideline though.



But a four megabyte email might as well go into
memory. You'd have to get a heck of a lot of people each reading a four meg
email at once for that to be a major problem. But, since we want DBMail to be
properly scalable to really large installations, it is a distinct possibility
to have that many people each reading an email that large (scenario: the CEO
sounds out his latest crazy plan in a four meg powerpoint, and everyone in the
whole company starts pounding on the mail server to retrieve their copy.)


I guess only a real-world test can expose the actual bottlenecks 
involved. You got me thinking ...



--
  
  Paul Stevens  mailto:[EMAIL PROTECTED]
  NET FACILITIES GROUP PGP: finger [EMAIL PROTECTED]
  The Netherlandshttp://www.nfg.nl


Re: [Dbmail-dev] messageblks logic

2004-05-24 Thread Ilja Booij

Ilja Booij wrote:

I'm not sure what the best way to handle this is, though. My thinking has
always revolved around handling huge messages without causing resource
starvation. So a gigabyte email should be parsed in pieces and not all
allocated into memory at once. But a four megabyte email might as well go into
memory. You'd have to get a heck of a lot of people each reading a four meg
email at once for that to be a major problem. But, since we want DBMail to be
properly scalable to really large installations, it is a distinct possibility
to have that many people each reading an email that large (scenario: the CEO
sounds out his latest crazy plan in a four meg powerpoint, and everyone in the
whole company starts pounding on the mail server to retrieve their copy.)



OTOH, a message which consists of multiple block will (almost) always be 
  fetched from the database completely anyway. With that in mind, it 
wouldn't matter if the message data is in 2 blocks (header and body) 
instead of  2 blocks.


I can remember something about the maximum size for a TEXT field in 
MySQL being the original reason for the choice of splitting the message 
into parts. I'll ask Eelco and Roel, they should know this.


I've asked Eelco about this:

MySQL used to have a limit on the client-server communication that 
forced us to limit the size of blocks being transferred. Nowadays that 
limit is much higher.


Setting the max_allowed_packet variable to a high number (a few MB for 
instance) on both client and server will allow for sending big blocks 
between client and server.


The TEXT field itself has no limit.
In PostgreSQL, there's a 1GB limit on the TEXT field.

I think we can safely go to a strategy where we put the message in 2 
blocks, 1 for the header and 1 for the body. However, if we make our 
parsing code to handle messages only in that way, we need to produce a 
script for migrating from a database with split messages. This should'n 
be too much of a problem though.


Ilja



Re: [Dbmail-dev] messageblks logic

2004-05-24 Thread Aaron Stone
I don't think we can safely cap message sizes at a few mb -- I've certainly
sent myself some hefty sized emails and would be quite frustrated if this
weren't a possibility because of an intrinsic, hard-coded limit.

Yes, it makes the code more complicated, but I think it's only because we
still haven't fully adapted our thinking to our model. It occurred to me today
that if we were MIME parsing at delivery time, we could also store the mime
structure of a message in the database. An IMAP BODY.PEEK for mime structures
would be nearly instantaneous. This is probably the single most common request
from an email client, especially web-based ones that need to refresh their
message list on almost every page hit.

If MySQL can retrieve huge (eg 1 GB) rows easily, and the limitation is only
when inserting or updating, then we might be able to use string concatenation
to append parts of the message until the body row has the whole thing.

Aaron



Ilja Booij [EMAIL PROTECTED] said:

 MySQL used to have a limit on the client-server communication that 
 forced us to limit the size of blocks being transferred. Nowadays that 
 limit is much higher.
 
 Setting the max_allowed_packet variable to a high number (a few MB for 
 instance) on both client and server will allow for sending big blocks 
 between client and server.
 
 The TEXT field itself has no limit.
 In PostgreSQL, there's a 1GB limit on the TEXT field.
 
 I think we can safely go to a strategy where we put the message in 2 
 blocks, 1 for the header and 1 for the body. However, if we make our 
 parsing code to handle messages only in that way, we need to produce a 
 script for migrating from a database with split messages. This should'n 
 be too much of a problem though.
 
 Ilja
 
 ___
 Dbmail-dev mailing list
 Dbmail-dev@dbmail.org
 http://twister.fastxs.net/mailman/listinfo/dbmail-dev
 

-- 



Re: [Dbmail-dev] messageblks logic

2004-05-24 Thread Ilja Booij

Aaron Stone wrote:

I don't think we can safely cap message sizes at a few mb -- I've certainly
sent myself some hefty sized emails and would be quite frustrated if this
weren't a possibility because of an intrinsic, hard-coded limit.


I wasn't suggesting capping at a few MB, but rather at something like 
128MB.


Yes, it makes the code more complicated, but I think it's only because we
still haven't fully adapted our thinking to our model. It occurred to me today
that if we were MIME parsing at delivery time, we could also store the mime
structure of a message in the database. An IMAP BODY.PEEK for mime structures
would be nearly instantaneous. This is probably the single most common request
from an email client, especially web-based ones that need to refresh their
message list on almost every page hit.


This can be very interesting to do. Of course, we still need to store 
the message in its original format, but to store some information on 
mime-parts would be very benificial for performance.


If MySQL can retrieve huge (eg 1 GB) rows easily, and the limitation is only
when inserting or updating, then we might be able to use string concatenation
to append parts of the message until the body row has the whole thing.


Setting the max_allowed_packet high enough (128MB, maybe even 256MB) 
would make this unnecessary.


Ilja




Re: [Dbmail-dev] messageblks logic

2004-05-24 Thread Leif Jackson

 Yes, it makes the code more complicated, but I think it's only because
 we
 still haven't fully adapted our thinking to our model. It occurred to me
 today
 that if we were MIME parsing at delivery time, we could also store the
 mime
 structure of a message in the database. An IMAP BODY.PEEK for mime
 structures
 would be nearly instantaneous. This is probably the single most common
 request
 from an email client, especially web-based ones that need to refresh
 their
 message list on almost every page hit.

 This can be very interesting to do. Of course, we still need to store
 the message in its original format, but to store some information on
 mime-parts would be very benificial for performance.

This would also lend to the header caching, or at least let me finish up a
sort and thread implementation that would be very fast.

just my 0.02

-leif



Re: [Dbmail-dev] messageblks logic

2004-05-24 Thread Jesse Norell


 I think we can safely go to a strategy where we put the message in 2 
 blocks, 1 for the header and 1 for the body. However, if we make our 
 parsing code to handle messages only in that way, we need to produce a 
 script for migrating from a database with split messages. This should'n 
 be too much of a problem though.

  Can a header flag be added to the message blocks table at this time?

--
Jesse Norell

[EMAIL PROTECTED] is not my email address;
change administrator to my first name.
--



Re: [Dbmail-dev] message delivery failure messages when over quotum

2004-05-24 Thread Aaron Stone
I started working on a version of sort_and_deliver that takes a dsnuser struct
as an argument, and moved one of the loops from pipe.c into sort.c This gives
sort_and_delivery an understanding that there may be several actual deliveries
associated with a particular address. It's just a sketch right now, and
doesn't yet solve any problems (it's just structured better, imho).

The problem always boils down to this, though: how do we assemble the
different codes returned by each of the mini-deliveries into a single code
to return for the address itself? (note that this is a separate, though
related, problem to the SMTP issue of finding one code for multiple addresses
-- IMHO, we should be recommending to people to force the MTA to go one at a
time when using dbmail-smtp).

Aaron


Ilja Booij [EMAIL PROTECTED] said:

 Hi,
 
 I've been looking at the code in lmtp.c, pipe.c, sort/sort.c and dsn.c 
 to see if we can do a bit better when a message can't be inserted 
 because of a user being over quotum.
 
 We *should* return an 552 message to the MTA, but currently we return a 450.
 
 What to do?
 
 We use db_copymsg() to copy the temporary message to the user's mailbox. 
 db_copymsg signals the calling function when it fails because of an 
 'over-quotum-error'. However, we then return a DSN_CLASS_TEMP anyway.
 
 We should improve the code to be able report a 552 to the MTA so it can 
 send the appropriate bounce message.
 
 Using LMTP, this should be quite easy, although we stil have a problem 
 when an alias expands to two users, one of which is over-quotum. In this 
 last case, I'd opt for sending the 552.
 
 Using dbmail-smtp: I wonder what information we can send back to the MTA 
 and how we can do this (in what format).
 
 Ilja
 
 ___
 Dbmail-dev mailing list
 Dbmail-dev@dbmail.org
 http://twister.fastxs.net/mailman/listinfo/dbmail-dev
 



-- 





Re: [Dbmail-dev] messageblks logic

2004-05-24 Thread Aaron Stone
I agree. We should fill in the wiki some more and get a plan together.

Aaron


Jesse Norell [EMAIL PROTECTED] said:

   It occurred to me today
   that if we were MIME parsing at delivery time, we could also store the 
   mime
   structure of a message in the database. An IMAP BODY.PEEK for mime 
   structures
   would be nearly instantaneous. This is probably the single most common
   request
   from an email client, especially web-based ones that need to refresh their
   message list on almost every page hit.
  
  This can be very interesting to do. Of course, we still need to store 
  the message in its original format, but to store some information on 
  mime-parts would be very benificial for performance.
 
   This would/could be another application for making a more generic
 per-message data cache, rather than solely for message headers.
 
 
 
 --
 Jesse Norell
 
 [EMAIL PROTECTED] is not my email address;
 change administrator to my first name.
 --
 
 ___
 Dbmail-dev mailing list
 Dbmail-dev@dbmail.org
 http://twister.fastxs.net/mailman/listinfo/dbmail-dev
 



--