Re: The Future of Email is SQL

2006-06-15 Thread Ramprasad
On Wed, 2006-06-14 at 11:50 -0700, Steve Thomas wrote:
> > So - like I said - this is visionary stuff. Think SQL - think outside
> > the box.
> 
> It's not all that visionary. Microsoft's been working on WinFS - a SQL
> based system for storing files - for years. It's supposed to have been
> released as a part of longhorn (vista), but they're pushing it back.

   Oracle has OCS , which consists of a
mail/calendar/ldap/fileserver/webserver/  ... blah blah all with SQL
storage. And the database is .. no points for guessing that. 
But OCS is a terrible resource HOG ( understatement ) I dont think there
are many users for OCS

IMHO SQL storage is definitely going to be there.
The common indexing mechanism is what makes such storage interesting. I
agree it is slow now, but hardware and software will get better then
resource will not be an issue

Ram



Re: The Future of Email is SQL

2006-06-14 Thread Marc Perkel



Kenneth Porter wrote:
On Tuesday, June 13, 2006 8:52 PM -0700 kbaker <[EMAIL PROTECTED]> 
wrote:


It is visionary in that it is not the "norm", but again DBMail does 
all of

this very well and has been production quality for quite some time.


I asked on the Dovecot list about how Dovecot compares to DBMail and 
got this reply from Dovecot's author:




I think Timo will eventually add a MySQL backend to Dovecot.


Re: The Future of Email is SQL

2006-06-14 Thread Kenneth Porter

On Tuesday, June 13, 2006 8:52 PM -0700 kbaker <[EMAIL PROTECTED]> wrote:


It is visionary in that it is not the "norm", but again DBMail does all of
this very well and has been production quality for quite some time.


I asked on the Dovecot list about how Dovecot compares to DBMail and got 
this reply from Dovecot's author:


 Forwarded Message 
Date: Tuesday, June 13, 2006 9:43 AM +0300
From: Timo Sirainen <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: Re: [Dovecot] DBMail versus Dovecot (was: Using MySQL to store 
email?)


On Mon, 2006-06-12 at 18:12 -0700, Kenneth Porter wrote:

On Saturday, June 10, 2006 10:07 AM -0400 Charles Marcus
<[EMAIL PROTECTED]> wrote:

> A reference to DBMail was among the first responses, and there have been
> others.

Has anyone compiled a comparison of Dovecot to DBMail? Why would I chose
one over the other?


I think their goals are quite different. Don't know if any such
comparisons would be all that useful.

Or I guess I can give you one difference: Dovecot tries very hard to be
secure. DBMail then seems to keep adding SQL injection security holes. I
said about this to them a few years ago and they fixed them, but now
that I looked at the code a few months ago they had added more of those.
-- End Forwarded Message --


Re: The Future of Email is SQL

2006-06-14 Thread Steve Thomas
> So - like I said - this is visionary stuff. Think SQL - think outside
> the box.

It's not all that visionary. Microsoft's been working on WinFS - a SQL
based system for storing files - for years. It's supposed to have been
released as a part of longhorn (vista), but they're pushing it back.

I'm still confused as to why this is even being discussed on this list,
though. SA is just a system for identifying and labeling certain types of
messages. It has nothing whatsoever to do with where or how those messages
are stored.

St-




Re: The Future of Email is SQL

2006-06-14 Thread List Mail User
>...
>Well - I'm a member of the Exim cult - but if something better comes 
>along I might convert. :)
>
And you're not even British:)  Actually I count Exim in the short
list of well done and readily usable/useful MTAs (i.e. works as expected,
not "can be made to work").  Still, I'm partial to postfix and use many
sendmail setups (20+ years of experience is hard to ignore).

Paul Shupak
[EMAIL PROTECTED]


Re: The Future of Email is SQL

2006-06-13 Thread kbaker
Thank you for a very well thought out *open* message. I would guess that most of 
these reasons are why DBMail was started 5 years ago ;)


I'm gonna response with some pro-DBMail stuff... just because it's in my head 
and pretty much addresses all of Marc's comments below.


Marc Perkel wrote:
This is still visionary so take it for what it's worth. People are more 
familiar with MAILDIR and MBOX because they are files. You can read them 
with VI and PICO and FGREP and all the stuff that we are familiar with. 
MySQL is also easy but might require new tools and some learning. Once 
you become familiar with it them everything is just as easy.


One could expoir and import to and from maildir and mbox, so that 
doesn't go away.

DBMail has both a maildir import and export.


With MySQL there are a lot of problems that go away. MySQL is a magic 
port that does everything for you. It doesn't care about what filesystem 
you're using, what OS you are running, what kinds of file locks or NFS 
mounts, or if you're using Reiser for maildir speed or if you have 
enough inodes. All that stuff goes away.
One of the great aspects of DBMail... SQL Clustering and replication independent 
of the OS. Cyrus and other great IMAP server have only recently gotten this 
working in Alpha versions. Otherwise it would require very expensive storage 
solutions to get any kind of failover or "realtime" replication. With 
DBMail/MySQL just setup another cheap server and configure replication done.



MBOX and MAILDIR have no indexing. You can add indexes externally but 
there are no standards for that. With MySQL you can index anything and 
everything. You can add fields to the message, any fiels, as many as you 
want, and they too can be keys and indexes. With maildir and mbox you 
can't really do that.
Many of the filesystem storage solutions do have indexing, but in file hashes. 
Zimbra has gone so far as to have a filesystem store with MySQL indexes for 
speed. As you point out these are not "standard" and don't scale unless they are 
on an expensive storage solution.



With MySQL you can access the data with any MySQL application. And the 
access is consistent no matter what programming language you use, what 
OS you use, anything. It's all SQL. So if you want a web interface you 
just write a PHP app.
There are a number of PHP and other scripts that access SQL directly for 
everything from webmail to administration... works great and very easy to work with.



Spamassassin for example has migrated from GB files to MySQL for the AWL 
and bayes and we all can see how this has improved performance and ease 
of implementation. Before SQL having 5 servers sharing the same bayes is 
difficult. With SQL it's trivial. The SQL does it all for you. They do 
the magic so you don't have to.


The indexing is a real key feature. If I have a key based on the sending 
host or index all the received lines, I could delete all messages that 
had an IP in any received line almost instantly. I can do it thousands 
of times faster than mbox or maildir because it's indexed. Indexing 
gives you incredible power and the SQL engine does all that for you. 
That SA and the IMAP and the MTA and the Web GUI - everything - all 
taking to a standard database - all integrated - all comnpatible.


So - like I said - this is visionary stuff. Think SQL - think outside 
the box.


It is visionary in that it is not the "norm", but again DBMail does all of
this very well and has been production quality for quite some time.

This is a great thread, but as far as starting a new project I'd just go with 
what is already working. DBMail is built in C, so very speedy.


Someone mentioned rewriting an IMAP server in perl... not sure if I'd go that 
way from a speed standpoint, but would certainly be interesting.


If you are looking for another approach Zimbra is written entirely in Java and 
Open. It uses only MySQL indexes, but would be very straight forward to replace 
its existing MailStore Class with one that writes to MySQL rather than the 
filesystem.




--
Kevin Baker

begin:vcard
fn:Kevin Baker
n:Baker;Kevin
email;internet:[EMAIL PROTECTED]
tel;work:858-454-5532
version:2.1
end:vcard



Re: The Future of Email is SQL

2006-06-13 Thread Marc Perkel



John Rudd wrote:


On Jun 13, 2006, at 7:52 PM, Marc Perkel wrote:



John Rudd wrote:


and maybe a decent perl MTA to put in front of it too (something 
that will work with sendmail milters...).




I think that a local delivery program could be written fairly easily 
that Exim or any other existing MTA could pipe messages into for 
delivery. So one wouldn't have to rewrite the MTA but just use 
existing MTAs and just change the delivery mechanism.



It's not a matter of have to.  It's a matter of want to.



Well - I'm a member of the Exim cult - but if something better comes 
along I might convert. :)




Re: The Future of Email is SQL

2006-06-13 Thread John Rudd


On Jun 13, 2006, at 7:52 PM, Marc Perkel wrote:



John Rudd wrote:


and maybe a decent perl MTA to put in front of it too (something that 
will work with sendmail milters...).




I think that a local delivery program could be written fairly easily 
that Exim or any other existing MTA could pipe messages into for 
delivery. So one wouldn't have to rewrite the MTA but just use 
existing MTAs and just change the delivery mechanism.



It's not a matter of have to.  It's a matter of want to.




Re: The Future of Email is SQL

2006-06-13 Thread Marc Perkel
This is still visionary so take it for what it's worth. People are more 
familiar with MAILDIR and MBOX because they are files. You can read them 
with VI and PICO and FGREP and all the stuff that we are familiar with. 
MySQL is also easy but might require new tools and some learning. Once 
you become familiar with it them everything is just as easy.


One could expoir and import to and from maildir and mbox, so that 
doesn't go away.


With MySQL there are a lot of problems that go away. MySQL is a magic 
port that does everything for you. It doesn't care about what filesystem 
you're using, what OS you are running, what kinds of file locks or NFS 
mounts, or if you're using Reiser for maildir speed or if you have 
enough inodes. All that stuff goes away.


MBOX and MAILDIR have no indexing. You can add indexes externally but 
there are no standards for that. With MySQL you can index anything and 
everything. You can add fields to the message, any fiels, as many as you 
want, and they too can be keys and indexes. With maildir and mbox you 
can't really do that.


With MySQL you can access the data with any MySQL application. And the 
access is consistent no matter what programming language you use, what 
OS you use, anything. It's all SQL. So if you want a web interface you 
just write a PHP app.


Spamassassin for example has migrated from GB files to MySQL for the AWL 
and bayes and we all can see how this has improved performance and ease 
of implementation. Before SQL having 5 servers sharing the same bayes is 
difficult. With SQL it's trivial. The SQL does it all for you. They do 
the magic so you don't have to.


The indexing is a real key feature. If I have a key based on the sending 
host or index all the received lines, I could delete all messages that 
had an IP in any received line almost instantly. I can do it thousands 
of times faster than mbox or maildir because it's indexed. Indexing 
gives you incredible power and the SQL engine does all that for you. 
That SA and the IMAP and the MTA and the Web GUI - everything - all 
taking to a standard database - all integrated - all comnpatible.


So - like I said - this is visionary stuff. Think SQL - think outside 
the box.




Re: The Future of Email is SQL

2006-06-13 Thread Marc Perkel



John Rudd wrote:


I had been thinking about how feasible it would be to re-implement 
dbmail in perl..


and maybe a decent perl MTA to put in front of it too (something that 
will work with sendmail milters...).


Then you could be pretty database agnostic.  Just whatever perl wants 
to put back there.





I think that a local delivery program could be written fairly easily 
that Exim or any other existing MTA could pipe messages into for 
delivery. So one wouldn't have to rewrite the MTA but just use existing 
MTAs and just change the delivery mechanism. Eventually I think that 
MTAs would integrate MySQL delivery. My guess is that it's easier to 
deliver to MySQL than MBOX or MAILDIR because MySQL does all the work 
for you. You just pass the data and let MySQL do that magic.


I'm also thinking that if SQL is used for mail storage that the SQL 
folks will evolve their databases to handle the needs of the email 
community. So those who point to Exchange as a disaster, I look at it as 
a first step. Something to take the good ideas and improve on them.




Re: The Future of Email is SQL

2006-06-13 Thread Marc Perkel



John Rudd wrote:


On Jun 9, 2006, at 1:19 PM, Marc Perkel wrote:


After considerable experimenting and thinking things through I thought
I'd start a thread on the future of email to start planting the seeds of
where MTA development needs to go. I'm convinced that someday soon we
will all realize that MBOX and MAILDIR are obsolete technologies and
that the future is going to be SQL based storage.



Have you looked at dbmail?

IMAP on top of MySQL.


I'm aware of it. What I'm hoping for is a MySQL backend for Dovecot. 
Timo has done some work in that direction. I'm hoping that after the 1.0 
release that he works on it again. I think I can create some sort of 
Exim frontend if Timo adds it to dovecot.




Re: The Future of Email is SQL

2006-06-13 Thread John Rudd


On Jun 9, 2006, at 3:16 PM, Rob McEwen wrote:




MS Exchange... one big Database


Exactly...

And that is one reason why I wouldn't touch this SQL idea with a 10 
foot
pole.. the fact that Exchange works this way only proves my point... I 
hear
all the time about Exchange servers crashing and the administrator 
having to
rebuild the database while the mail server is down for the next 10 
hours.


The bottom line is that using a SQL DB backend as mail storage is 
putting

all your eggs in one basket.



Not really.  With MySQL you can mirror and stripe data across multiple 
back end servers.  You lose one of the back end storage nodes?  No big 
deal.  Just get a new one up and running before you lose all of its 
mirrors.



Though, I would agree that using MS Exchange would be a bad idea.



Re: The Future of Email is SQL

2006-06-13 Thread John Rudd


On Jun 9, 2006, at 1:19 PM, Marc Perkel wrote:


After considerable experimenting and thinking things through I thought
I'd start a thread on the future of email to start planting the seeds 
of

where MTA development needs to go. I'm convinced that someday soon we
will all realize that MBOX and MAILDIR are obsolete technologies and
that the future is going to be SQL based storage.



Have you looked at dbmail?

IMAP on top of MySQL.



Re: The Future of Email is SQL

2006-06-13 Thread David Landgren

Jim C. Nasby wrote:

On Sat, Jun 10, 2006 at 01:23:35PM -0600,  wrote:
I would defer to the smart people to figure out the details. However I do 
wonder if the actual body content of the message would be best stored in a 
file and the SQL used to store anything and everything you would want to 
index. That would keep the SQL file size down if that's an issue. However, 
SQL databases might have to be changed to accomodate the needs to store 
email.
I think this is what I was getting at early in the thread.  I would think 
that a 5 MB body would do better on file but I don't know enough in regards 
to DBs to even make a call.


A good rule of thumb about storing something in the database is: are you
going to search that data? If you're going to search the text of an
email body, that makes it a more likely candidate for storing it in the
database (though there are ways to do this searching while storing the
file externally).


SQL databases suck dead dogs through a straw on full text searches. The 
language specification isn't designed for it. Database vendors offer 
support for it in various mutually-incompatible ways.


It's easy to precalc search indices for maildir and produce fast search 
results; mairix is one such tool that I know of, there are no doubt many 
other solutions available.


David

--
Much of the propaganda that passes for news in our own society is given 
to immobilising and pacifying people and diverting them from the idea 
that they can confront power. -- John Pilger




Re: The Future of Email is SQL

2006-06-12 Thread kbaker

Mike Jackson wrote:
I can't recall seeing any mention in this thread of DBmail (dbmail.org), 
which already exists and is an all-in-one SMTP/POP3/IMAP server with 
MySQL or Postgres message storage (with support for SQLite on the way). 
It's been in development for three or four years, and from what I 
remember is used by the developers on a mail system with 100K+ users. I 
like it as a concept, but haven't been brave enough to put it into 
production.



Yes I mentioned this the other day. Not too much response though.

DBMail is 5-6 years old. Especially active for the past 4 years. There are a 
number of production systems out there; some with thousands of users.


It can be setup all-in-one or integrated with Postfix or other with no problems. 
Features such as lmtp, maildrop, and now Sieve support for message filtering, 
put it on par with Cyrus from at least a feature standpoint.


I haven't seen benchmarking against the more well known servers, but with an 
optimized DB install it performs very well. In addition, it has many great 
possibilities with the RDBMS back-end. Clustering, replication not to mention 
direct SQL access to the data store which gives a lot of developers a lower 
learning curve for extending the software.


It really is worth taking a look. We are using it on an install with around 300 
users... with no problems. We are using a dbmail-postfix-mysql-amavis setup, 
with mysql replication to a hot backup server for fail-over. Obviously a small 
install, but it has been going very well.


- Kevin




Re: The Future of Email is SQL

2006-06-12 Thread Mike Jackson
I can't recall seeing any mention in this thread of DBmail (dbmail.org), 
which already exists and is an all-in-one SMTP/POP3/IMAP server with MySQL 
or Postgres message storage (with support for SQLite on the way). It's been 
in development for three or four years, and from what I remember is used by 
the developers on a mail system with 100K+ users. I like it as a concept, 
but haven't been brave enough to put it into production. 



RE: The Future of Email is SQL

2006-06-12 Thread Martin Hepworth
Well yes Exchange does have it's problems (its much better than it used to
be), but ya gotta remember the underlying "DB" is Access.

I think there are moves afoot for the next version of MS-Ex to be able to
run with SQl-Server as the backend datastore (2003 may already have this
ability) which is v. usefull for large (1000+) user bases.

Kinda proves the point really, you need a proper DB for this sort of thing
not some tin pot 'user' thing.



--
Martin Hepworth 
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300

> -Original Message-
> From: Rob McEwen [mailto:[EMAIL PROTECTED]
> Sent: 09 June 2006 23:16
> To: users@spamassassin.apache.org
> Subject: RE: The Future of Email is SQL
> 
> 
> >>MS Exchange... one big Database
> 
> Exactly...
> 
> And that is one reason why I wouldn't touch this SQL idea with a 10 foot
> pole.. the fact that Exchange works this way only proves my point... I
> hear
> all the time about Exchange servers crashing and the administrator having
> to
> rebuild the database while the mail server is down for the next 10 hours.
> 
> The bottom line is that using a SQL DB backend as mail storage is putting
> all your eggs in one basket.
> 
> I have a much simpler solution to accomplish the problem that this was
> idea
> was originally attempting to solve... simply place the spams that are
> caught
> in a folder on the mail server that is accessible via webmail. Then create
> a
> separate program to periodically enumerate through the spam folder in all
> the accounts on the server to delete spams over X days old.
> 
> If needed, you could still have a database with the basic info about the
> spams (date received, subject line, recipients, from, message file name,
> etc) to use for e-mailing "digests" to the user... and this DB's stability
> wouldn't then have to be tied to the overall reliability/stability of mail
> services.
> 
> Also keep in mind that SQL doesn't always mean better performance... I've
> seen many web sites that deliver content dynamically from a SQL database
> backend where there were noticeably large delays between page loads, for
> example.
> 
> Rob McEwen
> PowerView Systems
> [EMAIL PROTECTED]
> 



**

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.   

**



Re: The Future of Email is SQL

2006-06-11 Thread Jason Haar
Jim C. Nasby wrote:
> Having said all that; it's nearly impossible to get a general-purpose RDBMS 
> to outperform an optimized storage format 
Indeed. I refer back to the wondrous success Microsoft Exchange has had.

It *isn't* SQL. It's a hand-crafted, JET-backend specifically written to
be optimized for email handling. Even Microsoft (so far) hasn't managed
to get their own MS-SQL product to be usable as a backend. Believe me,
they'd like nothing better from a marketing perspective...

...However, I do believe they are still working towards such an end.

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1



Re: The Future of Email is SQL

2006-06-10 Thread kbaker

Jim C. Nasby wrote:

On Sat, Jun 10, 2006 at 01:23:35PM -0600,  wrote:
I would defer to the smart people to figure out the details. However I do 
wonder if the actual body content of the message would be best stored in a 
file and the SQL used to store anything and everything you would want to 
index. That would keep the SQL file size down if that's an issue. However, 
SQL databases might have to be changed to accomodate the needs to store 
email.

So I missed the beginning of the thread, but thought I'd point out 
www.dbmail.org

This is an open source IMAP server with a RDMS message store. So yes it has been 
done, yes it works and yes there are a number of production installs that have 
been running great for a number of years now.


It might not be as fast as Cyrus for instance, but we've been running it with 
MySQL replication to second server for a fail-over for a bit now. Works great 
and was *really* easy to setup.



I think this is what I was getting at early in the thread.  I would think 
that a 5 MB body would do better on file but I don't know enough in regards 
to DBs to even make a call.


A good rule of thumb about storing something in the database is: are you
going to search that data? If you're going to search the text of an
email body, that makes it a more likely candidate for storing it in the
database (though there are ways to do this searching while storing the
file externally).

Another consideration is that storing everything in the database is
substantially easier than splitting between a database and the
filesystem. If you think this is a non-issue, consider how to deal with
all the error conditions where either the database or the filesystem is
updated, but not both.

Of course, storing anything in a database is going to have more
overheard than storing it as raw bytes on the filesystem, and there's
not really a way around that. Different databases will impose different
amounts of overhead.

As for all the arguments about how databases won't scale, or how they're
a single point of failure.. what exactly do you think a single mail
server is? Answer: not scalable and a single point of failure. Of
course there are ways to work around that, and those methods apply just
as well to databases (though the implementation can be different). Most
databases support at least some form of replication, and many support
clustering. And of course you don't have to try and cram all your users
into a single database.

Having said all that; it's nearly impossible to get a general-purpose RDBMS 
to outperform an optimized storage format (if you find an example where

it is possible, I'd wager that's only true because the original format
wasn't very well thought-out). It's essentially a given that a given set
of hardware will be able to handle a higher load of storing and
retrieving emails using maildir rather than a database (unless you get
enough messages in a directory that it starts choking the filesystem).
But if you want to do something like search for specific emails, there's
a much better chance that a database will outperform maildir, especially
if you're searching the message body. And there's other potential
applications where a database would outperform maildir as well.

So, in a nutshell, if you're not going to try doing something more
advanced than just storing and retrieving email, it's unlikely that
you'll be happy with storing that email in a database. The further off
that 'beaten path' you get, the more likely you are to see benefit from
using a database.





Re: The Future of Email is SQL

2006-06-10 Thread Jim C. Nasby
On Sat, Jun 10, 2006 at 01:23:35PM -0600,  wrote:
> >I would defer to the smart people to figure out the details. However I do 
> >wonder if the actual body content of the message would be best stored in a 
> >file and the SQL used to store anything and everything you would want to 
> >index. That would keep the SQL file size down if that's an issue. However, 
> >SQL databases might have to be changed to accomodate the needs to store 
> >email.
> 
> I think this is what I was getting at early in the thread.  I would think 
> that a 5 MB body would do better on file but I don't know enough in regards 
> to DBs to even make a call.

A good rule of thumb about storing something in the database is: are you
going to search that data? If you're going to search the text of an
email body, that makes it a more likely candidate for storing it in the
database (though there are ways to do this searching while storing the
file externally).

Another consideration is that storing everything in the database is
substantially easier than splitting between a database and the
filesystem. If you think this is a non-issue, consider how to deal with
all the error conditions where either the database or the filesystem is
updated, but not both.

Of course, storing anything in a database is going to have more
overheard than storing it as raw bytes on the filesystem, and there's
not really a way around that. Different databases will impose different
amounts of overhead.

As for all the arguments about how databases won't scale, or how they're
a single point of failure.. what exactly do you think a single mail
server is? Answer: not scalable and a single point of failure. Of
course there are ways to work around that, and those methods apply just
as well to databases (though the implementation can be different). Most
databases support at least some form of replication, and many support
clustering. And of course you don't have to try and cram all your users
into a single database.

Having said all that; it's nearly impossible to get a general-purpose RDBMS 
to outperform an optimized storage format (if you find an example where
it is possible, I'd wager that's only true because the original format
wasn't very well thought-out). It's essentially a given that a given set
of hardware will be able to handle a higher load of storing and
retrieving emails using maildir rather than a database (unless you get
enough messages in a directory that it starts choking the filesystem).
But if you want to do something like search for specific emails, there's
a much better chance that a database will outperform maildir, especially
if you're searching the message body. And there's other potential
applications where a database would outperform maildir as well.

So, in a nutshell, if you're not going to try doing something more
advanced than just storing and retrieving email, it's unlikely that
you'll be happy with storing that email in a database. The further off
that 'beaten path' you get, the more likely you are to see benefit from
using a database.
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: The Future of Email is SQL

2006-06-10 Thread jdow

From: "NM Public" <[EMAIL PROTECTED]>

Sur 2006-06-09, Marc Perkel skribis:


Perhaps the headers and other information that you would index 
be kept in the database and the body of the message stored 
somewhere else, perhaps even as files.



It seems that this is what Zimbra does. Check out my blog post 
here:


 For IMAP, "SQL just sucks"
 

especially the comment from KevinH of Zimbra, which includes 
this:


"It's true that as a mailstore SQL sucks. Databases are not 
designed to store large blobs of data. However in Zimbra's case 
messages aren't stored in the database. Only *meta* data is 
stored behind a SQL interface. [...]"


I hope this is useful. Thanks for inspiring interesting 
discussion Marc!

 Nancy
  (sent via gmane.mail.spam.spamassassin.general)


If you only own a hammer and (think) you know how to use a hammer
all jobs seem to turn into jobs that need a hammer to solve them.
SQL is just one more tool. Don't be pathetic and try to use a hammer
as a means to remove a tiny delicate screw. You might as well try
saving email as XML objects or compiled into Objective C source
code modules. This discussion has gone on long enough it's rather
plain that SQL is not a natural fit for email storage, hasn't it?

{^_^}


Re: The Future of Email is SQL

2006-06-10 Thread Jay Plesset




"fast enough" is a value judgement.

Fast enough may be ok, if you have a few hundred or even a few thousand
users, saving small mailboxes.

In a large scale system, where you have a million users, each of which
has thousands of messages, I doubt any current database, SQL or other
will have that kind of performance.

I regularly use a mail server capable of handling that kind of load. 
It's free, and will eventually be open sourced.  Sun Java System
Messaging Server.  Runs on Solaris, Soaris X86, Linux.

Uses individual files for each message.

jay plesset
sr. tech support engineer.  

Sun Microsystem.

Marc Perkel wrote:

  
  
  After considerable experimenting and thinking things through I thought 
I'd start a thread on the future of email to start planting the seeds of 
where MTA development needs to go. I'm convinced that someday soon we 
will all realize that MBOX and MAILDIR are obsolete technologies and 
that the future is going to be SQL based storage.

First - before everyone starts screaming about speed comparisons, I'm 
not going to go there. Every storage technology has it's advantages and 
disadvantages but I'm just going to say that SQL based mail storage is 
fast enough. The advantages of SQL has to do with power and not with 
speed. Those who would choose it would do so because they want to do new 
things that you can do with a database and can't do without one.

SQL has several advantages. You don't have t deal with the quirks of the 
underlying file system or OS. It takes care of all the locking issues 
and indexing and makes it so that multiple applications can seamlessly 
access the data. With an SQL backend email can be stored from the MTA, 
read from and IMAP client that accesses the same database, and the spam 
filtering engine will have access to the stored email as well.

To give you some examples of what could be done .

Suppose a spammer sends 1000 phishing spams to your users and then you 
figure out that the 1000 spams already delivered is spam. With a 
database you can do a query to retroactively delete spam that was 
already delivered to the mailboxes. This could also be used to 
retroactively delete viruses already delivered.

Spam filtering programs can lookup existing email in existing folders 
and compare it with new email already deliverd to help determine more 
accurately if a message is spam or not. For example, if the host server 
has a reputation for 100% ham then it can deliver new email without 
running it through Spam Assassin. If programs like Spamassassin can 
access existing email in existing folders it can evaluate new email 
using tricks no one has yet considered.

SQL databases allow for multiple masters and slaves and replication that 
lets you create a cluster that never fails under any conditions. It 
would be far easier to create a system that is always on and always 
backed up.

An SQL backend allows you to use a wide variety of tools, programming 
languages, operating systems in order for you to easily integrate more 
easily than non database systems.

And - this is important - once you have a database then new things that 
no one has yet thought of will be possible and new things we've never 
heard of will be developed because the new power will lend to the 
development of more tricks than you can do without database power.

My point here is - think outside the box. I'm going to be lobbying IMAP 
server developers to include SQL backends. exim could pipe data into a 
local delivery agent, or it can have features written to write directly 
to the SQL backend.

Thoughts . ?


-- 
## List details at http://www.exim.org/mailman/listinfo/exim-users 
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://www.exim.org/eximwiki/


  





Re: The Future of Email is SQL

2006-06-10 Thread qqqq
I would defer to the smart people to figure out the details. However I do 
wonder if the actual body content of the message would be best stored in a 
file and the SQL used to store anything and everything you would want to 
index. That would keep the SQL file size down if that's an issue. However, 
SQL databases might have to be changed to accomodate the needs to store 
email.


I think this is what I was getting at early in the thread.  I would think 
that a 5 MB body would do better on file but I don't know enough in regards 
to DBs to even make a call.







Re: The Future of Email is SQL

2006-06-10 Thread Marc Perkel



NM Public wrote:

Sur 2006-06-09, Marc Perkel skribis:


Perhaps the headers and other information that you would index be 
kept in the database and the body of the message stored somewhere 
else, perhaps even as files.



It seems that this is what Zimbra does. Check out my blog post here:

 For IMAP, "SQL just sucks"
 

especially the comment from KevinH of Zimbra, which includes this:

"It's true that as a mailstore SQL sucks. Databases are not designed 
to store large blobs of data. However in Zimbra's case messages aren't 
stored in the database. Only *meta* data is stored behind a SQL 
interface. [...]"


I hope this is useful. Thanks for inspiring interesting discussion Marc!
 Nancy
  (sent via gmane.mail.spam.spamassassin.general)



I would defer to the smart people to figure out the details. However I 
do wonder if the actual body content of the message would be best stored 
in a file and the SQL used to store anything and everything you would 
want to index. That would keep the SQL file size down if that's an 
issue. However, SQL databases might have to be changed to accomodate the 
needs to store email.





Re: The Future of Email is SQL

2006-06-10 Thread NM Public

Sur 2006-06-09, Marc Perkel skribis:


Perhaps the headers and other information that you would index 
be kept in the database and the body of the message stored 
somewhere else, perhaps even as files.



It seems that this is what Zimbra does. Check out my blog post 
here:


 For IMAP, "SQL just sucks"
 

especially the comment from KevinH of Zimbra, which includes 
this:


"It's true that as a mailstore SQL sucks. Databases are not 
designed to store large blobs of data. However in Zimbra's case 
messages aren't stored in the database. Only *meta* data is 
stored behind a SQL interface. [...]"


I hope this is useful. Thanks for inspiring interesting 
discussion Marc!

 Nancy
  (sent via gmane.mail.spam.spamassassin.general)

--
  Nancy McGough
  Infinite Ink: 
  Bookmarks & Blog:  



Re: The Future of Email is SQL

2006-06-10 Thread Marc Perkel



Gary W. Smith wrote:

It's getting there, albeit slowly.  I think that if you rule out any up
and coming application but it's just not there yet we wouldn't have an
opensource community...  


We have a variety of reasons for using MySQL, most of them aren't good
ones though but it's something we've been able to work with for some
time.

  


I'd say start with MySQL because it's so common but it should be able to 
talk to any popular DB.


Re: The Future of Email is SQL

2006-06-10 Thread Marc Perkel



Steve Thomas wrote:

While this is quite an interesting topic, I have to ask why it's on the
spamassassin list. Message stores aren't spamassassin specific and this is
already a pretty high-volume list. Does this discussion really belong
here?

St-


  


The reason I posted it here as well as in Dovecot and Exim lists is 
because I think the real benefit comes when SA is also integrated into 
the system. With a DB in place SA could do things like delete email 
that's already been delivered. That way something that was thought to 
not be spam but later determined to be spam can be deleted 
retroactively. And - I think with the ability to tie into the DB that 
new spam detection tricks can be used to identify spam based on what the 
user already has.


Picture the MTA, SA, and the IMAP server all integrated into a single 
database. The purpose of starting this thread is to inspire thought. 
Then maybe down the line this will happen.


Re: The Future of Email is SQL

2006-06-10 Thread Marc Perkel






Jim C. Nasby wrote:

  On Fri, Jun 09, 2006 at 06:16:15PM -0400, Rob McEwen wrote:
  
  

  
MS Exchange... one big Database

  

Exactly...

And that is one reason why I wouldn't touch this SQL idea with a 10 foot
pole.. the fact that Exchange works this way only proves my point... I hear
all the time about Exchange servers crashing and the administrator having to
rebuild the database while the mail server is down for the next 10 hours.

  
  
Just because MS couldn't figure out how to do this correctly doesn't
mean it can't be done.
  


Thanks Jim. The idea here is to be forward looking and only do it if
the databases are up to it. My thinking is that DBs will continue to
improve to the point where using them makes more and more sense.
Obviously if the DB isn't up to the task then there's no readon to do
it.

Here's something called DBMail that looks like it's on the right track.
http://www.dbmail.org/dokuwiki/doku.php?id=bigpicture





RE: The Future of Email is SQL

2006-06-09 Thread Gary W. Smith
Don't know...  Been using Oracle and MSSQL for years.  Both of those
work fine.  Don't understand the argument.  Why use Postgres when I can
just piggy back them on my replicated Oracle environment.

If we are talking about making a SQL application that is usable for a
multitude of people then why lock them into something.  That's the
easiest way to drive them away from supporting it.

> -Original Message-
> From: Jim C. Nasby [mailto:[EMAIL PROTECTED]
> Sent: Friday, June 09, 2006 9:21 PM
> To: Gary W. Smith
> Cc: Marc Perkel; users@spamassassin.apache.org
> Subject: Re: The Future of Email is SQL
> 
> On Fri, Jun 09, 2006 at 09:16:10PM -0700, Gary W. Smith wrote:
> > It's getting there, albeit slowly.  I think that if you rule out any
up
> > and coming application but it's just not there yet we wouldn't have
an
> > opensource community...
> >
> > We have a variety of reasons for using MySQL, most of them aren't
good
> > ones though but it's something we've been able to work with for some
> > time.
> 
> Why would you deal with the short-commings when you could just use
> PostgreSQL, SQLite, or even Innobase?
> --
> Jim C. Nasby, Database Architect[EMAIL PROTECTED]
> Give your computer some brain candy! www.distributed.net Team #1828
> 
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"


Re: The Future of Email is SQL

2006-06-09 Thread Jim C. Nasby
On Fri, Jun 09, 2006 at 09:16:10PM -0700, Gary W. Smith wrote:
> It's getting there, albeit slowly.  I think that if you rule out any up
> and coming application but it's just not there yet we wouldn't have an
> opensource community...  
> 
> We have a variety of reasons for using MySQL, most of them aren't good
> ones though but it's something we've been able to work with for some
> time.
 
Why would you deal with the short-commings when you could just use
PostgreSQL, SQLite, or even Innobase?
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


RE: The Future of Email is SQL

2006-06-09 Thread Gary W. Smith
It's getting there, albeit slowly.  I think that if you rule out any up
and coming application but it's just not there yet we wouldn't have an
opensource community...  

We have a variety of reasons for using MySQL, most of them aren't good
ones though but it's something we've been able to work with for some
time.


> -Original Message-
> From: Jim C. Nasby [mailto:[EMAIL PROTECTED]
> Sent: Friday, June 09, 2006 9:05 PM
> To: Marc Perkel
> Cc: Gary W. Smith; users@spamassassin.apache.org
> Subject: Re: The Future of Email is SQL
> 
> On Fri, Jun 09, 2006 at 02:50:03PM -0700, Marc Perkel wrote:
> > Gary,
> >
> > I'm trying to introduce the idea of a MySQL backend to Timo over at
> > Dovecot. He has done a little work in that direction already. But -
I'm
> > throwing this idea out there right now just to get people thinking.
I'm
> > hoping that in the next year as people think this through that some
> > serious development will occur. I think that as people say AH HA
that
> > development will progress.
> 
> Before you start getting stuck with MySQL you should read
> http://sql-info.de/mysql/gotchas.html. You'd be much better off with a
> database that's actually standards compliant.
> 
> Probably the best bet would be to offer support for SQLite and
> PostgreSQL. That allows small users to have the 0 maintenance of
SQLite
> while big users get the scaleability of PostgreSQL.
> --
> Jim C. Nasby, Database Architect[EMAIL PROTECTED]
> Give your computer some brain candy! www.distributed.net Team #1828
> 
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"


Re: The Future of Email is SQL

2006-06-09 Thread Steve Thomas
While this is quite an interesting topic, I have to ask why it's on the
spamassassin list. Message stores aren't spamassassin specific and this is
already a pretty high-volume list. Does this discussion really belong
here?

St-




Re: The Future of Email is SQL

2006-06-09 Thread Jim C. Nasby
On Fri, Jun 09, 2006 at 06:16:15PM -0400, Rob McEwen wrote:
> 
> >>MS Exchange... one big Database
> 
> Exactly...
> 
> And that is one reason why I wouldn't touch this SQL idea with a 10 foot
> pole.. the fact that Exchange works this way only proves my point... I hear
> all the time about Exchange servers crashing and the administrator having to
> rebuild the database while the mail server is down for the next 10 hours.

Just because MS couldn't figure out how to do this correctly doesn't
mean it can't be done.
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: The Future of Email is SQL

2006-06-09 Thread Jim C. Nasby
On Fri, Jun 09, 2006 at 02:50:03PM -0700, Marc Perkel wrote:
> Gary,
> 
> I'm trying to introduce the idea of a MySQL backend to Timo over at 
> Dovecot. He has done a little work in that direction already. But - I'm 
> throwing this idea out there right now just to get people thinking. I'm 
> hoping that in the next year as people think this through that some 
> serious development will occur. I think that as people say AH HA that 
> development will progress.

Before you start getting stuck with MySQL you should read
http://sql-info.de/mysql/gotchas.html. You'd be much better off with a
database that's actually standards compliant.

Probably the best bet would be to offer support for SQLite and
PostgreSQL. That allows small users to have the 0 maintenance of SQLite
while big users get the scaleability of PostgreSQL.
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


RE: The Future of Email is SQL

2006-06-09 Thread John D. Hardin
On Fri, 9 Jun 2006, Rob McEwen wrote:

> >>MS Exchange... one big Database
> 
> Exactly...
> 
> And that is one reason why I wouldn't touch this SQL idea with a
> 10 foot pole.. the fact that Exchange works this way only proves
> my point... I hear all the time about Exchange servers crashing
> and the administrator having to rebuild the database while the
> mail server is down for the next 10 hours.
> 
> The bottom line is that using a SQL DB backend as mail storage is
> putting all your eggs in one basket.

Not to mention you get the same problem that everyone complains about
with the Windows Registry: everything is buried in this black box
storage format that you can only access with specific tools - you lose
the ability to access and process email messages with the rich suite
of simple text processing tools that are available, and the ability to
read your email with something as simple as a text editor.

Granted, there is less of the "impenetrable black box" situation with
a SQL database than there is with the Registry, but the same concepts
and limitations apply.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The first time I saw a bagpipe, I thought the player was torturing
  an octopus. I was amazed they could scream so loudly.
-- cat_herder_5263 on Y! SCOX
---
 9 days until SWMBO's Birthday



Re: The Future of Email is SQL

2006-06-09 Thread Marc Perkel






Greg Allen wrote:

  
  
  
-Original Message-
From: Rob McEwen [mailto:[EMAIL PROTECTED]]
Sent: Friday, June 09, 2006 6:16 PM
To: users@spamassassin.apache.org
Subject: RE: The Future of Email is SQL





  
MS Exchange... one big Database

  

Exactly...

And that is one reason why I wouldn't touch this SQL idea with a 10 foot
pole.. the fact that Exchange works this way only proves my
point... I hear
all the time about Exchange servers crashing and the
administrator having to
rebuild the database while the mail server is down for the next 10 hours.

  
  
Yup, I have worked on Exchange servers for years. 5.5 blew up all the time.
2000 not so much at all. I expect 2003 is failrly stable. But regardless, if
it does go... that group of users is down all day. I know some orgs are
using clusters on Exchange to help with that problem... but now you have a
cluster that only one guy knows how to work on. The guy who set it up. So,
if the cluster gets screwed somehow, you have to find the guy who set it up,
you then have to fix the cluster, and then the Exchange. Just shoot yourself
in the head and save some time.

I would rather use Exchange with seperate PST files for each user, and it
will let you do that. The reason most companies end up on the single HUGE
database it because Exchange requires that be down to share appointments,
tasks, etc.

  

What I have in mind would be a far better database than Exchange. I'm assuming a really good database like MySQL or Oracle and I'm assuming that in the future that databases will get even better. Spamassassin has switched from DB files to MySQL and I think that kind of evolution will continue. So - I'm trying to plant that idea of a SQL future.






RE: The Future of Email is SQL

2006-06-09 Thread Greg Allen


> -Original Message-
> From: Rob McEwen [mailto:[EMAIL PROTECTED]
> Sent: Friday, June 09, 2006 6:16 PM
> To: users@spamassassin.apache.org
> Subject: RE: The Future of Email is SQL
>
>
>
> >>MS Exchange... one big Database
>
> Exactly...
>
> And that is one reason why I wouldn't touch this SQL idea with a 10 foot
> pole.. the fact that Exchange works this way only proves my
> point... I hear
> all the time about Exchange servers crashing and the
> administrator having to
> rebuild the database while the mail server is down for the next 10 hours.

Yup, I have worked on Exchange servers for years. 5.5 blew up all the time.
2000 not so much at all. I expect 2003 is failrly stable. But regardless, if
it does go... that group of users is down all day. I know some orgs are
using clusters on Exchange to help with that problem... but now you have a
cluster that only one guy knows how to work on. The guy who set it up. So,
if the cluster gets screwed somehow, you have to find the guy who set it up,
you then have to fix the cluster, and then the Exchange. Just shoot yourself
in the head and save some time.

I would rather use Exchange with seperate PST files for each user, and it
will let you do that. The reason most companies end up on the single HUGE
database it because Exchange requires that be down to share appointments,
tasks, etc.





RE: The Future of Email is SQL

2006-06-09 Thread Rob McEwen

>>MS Exchange... one big Database

Exactly...

And that is one reason why I wouldn't touch this SQL idea with a 10 foot
pole.. the fact that Exchange works this way only proves my point... I hear
all the time about Exchange servers crashing and the administrator having to
rebuild the database while the mail server is down for the next 10 hours.

The bottom line is that using a SQL DB backend as mail storage is putting
all your eggs in one basket.

I have a much simpler solution to accomplish the problem that this was idea
was originally attempting to solve... simply place the spams that are caught
in a folder on the mail server that is accessible via webmail. Then create a
separate program to periodically enumerate through the spam folder in all
the accounts on the server to delete spams over X days old.

If needed, you could still have a database with the basic info about the
spams (date received, subject line, recipients, from, message file name,
etc) to use for e-mailing "digests" to the user... and this DB's stability
wouldn't then have to be tied to the overall reliability/stability of mail
services.

Also keep in mind that SQL doesn't always mean better performance... I've
seen many web sites that deliver content dynamically from a SQL database
backend where there were noticeably large delays between page loads, for
example.

Rob McEwen
PowerView Systems
[EMAIL PROTECTED]




Re: The Future of Email is SQL - What drives do you use?

2006-06-09 Thread DAve

 wrote:

| Between two mail gateways and three toasters we have 14 disks that never
| stop seeking, never, 24/7/365. A consumer grade storage device would
| scream "mommy" and wet itself.
|
| DAve

OK, I'm sorry for changing the subject but I have had good results with 18 and 
36 GB IBM SCSI
drives.

What do you use?



Fujitsu is nice, Seagate is better. WD and Maxtor make poor doorstops as 
they are too light, but they make a funny plunk sound when they hit water.


But it's the model of the drive and it's intended purpose that matters, 
not so much the label. Every manufacturer makes consumer grade equipment.


DAve


--
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?

Maybe they forgot who made that choice possible.


RE: The Future of Email is SQL

2006-06-09 Thread Greg Allen

-Original Message-
From: Marc Perkel [mailto:[EMAIL PROTECTED]
Sent: Friday, June 09, 2006 4:19 PM
To: users@spamassassin.apache.org
Subject: The Future of Email is SQL

Thoughts . ?

-

MS Exchange... one big Database

You can use Exmerge to do some of what you are looking to do (delete one
email to all users, export dates of email, etc.).

If someone sends 1,000 copies of the same email to all of the users on the
same organization (cc,bcc) the message is stored only once, with pointers to
it.

It may not do everything you are looking to do, not sure.

Just pointing out that Microsoft has already started down that route to some
extent and they may end up using SQL even.








Re: The Future of Email is SQL

2006-06-09 Thread Marc Perkel






Jim C. Nasby wrote:

  On Fri, Jun 09, 2006 at 02:25:52PM -0600,  wrote:
  
  
My point here is - think outside the box. I'm going to be lobbying IMAP 
server developers to include SQL backends. exim could pipe data into a 
local delivery agent, or it can have features written to write directly 
to the SQL backend.

Thoughts . ?

Because I am an SQL dummy, I do have this question.  Would aps like Mysql and Postgres be able to handle 10,000+ users with an average of 50 MB of email?  

  
  
There are people happily running PostgreSQL with terrabyte databases.
It's really a question of how much concurrency you need.

One nice thing about databases is they make it possible to do things
like partition your tables by month/week/whatever. You can then move
older data onto larger partitions that use slower, cheaper drives.
  


Perhaps the headers and other information that you would index be kept
in the database and the body of the message stored somewhere else,
perhaps even as files. 

I'm just trying to inspire thought and creativity here btw.





Re: The Future of Email is SQL

2006-06-09 Thread Marc Perkel




Gary,

I'm trying to introduce the idea of a MySQL backend to Timo over at
Dovecot. He has done a little work in that direction already. But - I'm
throwing this idea out there right now just to get people thinking. I'm
hoping that in the next year as people think this through that some
serious development will occur. I think that as people say AH HA that
development will progress.

Gary W. Smith wrote:

  
  

  
  
  Marc, 
   
  We have had
to approach this in a similar
fashion.  We have large volume email accounts under cyrus as well as a
custom spam filtering system (behind SA).  Here is the approach we did.
   
  We have
cyrus setup on multiple partitions
based upon the directories.  This allows us to upgrade individual sets
of
directories based on load.  Though this approach isn’t the best it
works well.  We have over 500gb on a single server.
   
  We have had
a problem with spam, just like
everyone else.  The spam no longer hits many of our user accounts. 
Instead it is inserted into a database and they are sent a daily digest
(or
they can look it up).  We started with a simple set of tables which in
testing grew very large (5gb) with our test set.  In production this
would
have been 100gb.  We only retain 15 days…
   
  To
accomplish this we looking into
splitting up the data just like we did for cyrus.  We broke that single
table down into x tables (x being defined as a tweakable number – for
prod
we use 200).  We use random allocation to put an email into one of the
tables. 
This becomes important as the data is separated from some basic
information
which allows us to keep these files on x number of spindles or network
devices
and managed in a much simpler fashion.
   
  We have been
looking at imap based on db’s
as their backends and are still in the air on them as they don’t meet
all
of our requirements right now (in their stable form) but going forward
I think
that SQL emails might become our designed transport.  Our SQL servers
for
handling this are clustered machines, each with about 600gb disk space,
under
linux-ha and DRBD.  This is also then replicated to a matching offsite
database cluster.
   
  I believe
that there is a use for a
technology focused more around databases (actually there are some right
now
just very specific to themselves and not really configurable) that will
replace
existing named systems (such as uw-imap and cyrus).  I would guess that
these tools themselves might start that implementation within
themselves (hint
hint) so we don’t have to turn to the alternative imap systems.
   
  Anyway, this
stuff exists and some of us
use certain concepts already applied.  Implementation is simple in many
cases.
   
   
  
  
  
  
  From: Marc
Perkel [mailto:[EMAIL PROTECTED]] 
  Sent: Friday, June 09,
2006 1:19
PM
  To:
users@spamassassin.apache.org
  Subject: The Future of
Email is
SQL
  
   
  
  After considerable experimenting and thinking things through I thought 
  I'd start a thread on the future of email to start planting the seeds of 
  where MTA development needs to go. I'm convinced that someday soon we 
  will all realize that MBOX and MAILDIR are obsolete technologies and 
  that the future is going to be SQL based storage.
   
  First - before everyone starts screaming about speed comparisons, I'm 
  not going to go there. Every storage technology has it's advantages and 
  disadvantages but I'm just going to say that SQL based mail storage is 
  fast enough. The advantages of SQL has to do with power and not with 
  speed. Those who would choose it would do so because they want to do new 
  things that you can do with a database and can't do without one.
   
  SQL has several advantages. You don't have t deal with the quirks of the 
  underlying file system or OS. It takes care of all the locking issues 
  and indexing and makes it so that multiple applications can seamlessly 
  access the data. With an SQL backend email can be stored from the MTA, 
  read from and IMAP client that accesses the same database, and the spam 
  filtering engine will have access to the stored email as well.
   
  To give you some examples of what could be done .
   
  Suppose a spammer sends 1000 phishing spams to your users and then you 
  figure out that the 1000 spams already delivered is spam. With a 
  database you can do a query to retroactively delete spam that was 
  already delivered to the mailboxes. This could also be used to 
  retroactively delete viruses already delivered.
   
  Spam filtering programs can lookup existing email in existing folders 
  and compare it with new email already deliverd to help determine more 
  accurately if a message is spam or not. For example, if the host server 
  has a reputation for 100% ham then it can deliver new email without 
  running it through Spam Assassin. If programs like Spamassassin can 
  access existing email in existing folders it can evaluate new email 
  using tricks no one has yet considered.
   
  SQL database

Re: The Future of Email is SQL

2006-06-09 Thread Jim C. Nasby
On Fri, Jun 09, 2006 at 02:25:52PM -0600,  wrote:
> 
> My point here is - think outside the box. I'm going to be lobbying IMAP 
> server developers to include SQL backends. exim could pipe data into a 
> local delivery agent, or it can have features written to write directly 
> to the SQL backend.
> 
> Thoughts . ?
> 
> Because I am an SQL dummy, I do have this question.  Would aps like Mysql and 
> Postgres be able to handle 10,000+ users with an average of 50 MB of email?  

There are people happily running PostgreSQL with terrabyte databases.
It's really a question of how much concurrency you need.

One nice thing about databases is they make it possible to do things
like partition your tables by month/week/whatever. You can then move
older data onto larger partitions that use slower, cheaper drives.
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


RE: The Future of Email is SQL

2006-06-09 Thread Gary W. Smith








Marc, 

 

We have had to approach this in a similar
fashion.  We have large volume email accounts under cyrus as well as a
custom spam filtering system (behind SA).  Here is the approach we did.

 

We have cyrus setup on multiple partitions
based upon the directories.  This allows us to upgrade individual sets of
directories based on load.  Though this approach isn’t the best it
works well.  We have over 500gb on a single server.

 

We have had a problem with spam, just like
everyone else.  The spam no longer hits many of our user accounts. 
Instead it is inserted into a database and they are sent a daily digest (or
they can look it up).  We started with a simple set of tables which in
testing grew very large (5gb) with our test set.  In production this would
have been 100gb.  We only retain 15 days…

 

To accomplish this we looking into
splitting up the data just like we did for cyrus.  We broke that single
table down into x tables (x being defined as a tweakable number – for prod
we use 200).  We use random allocation to put an email into one of the tables. 
This becomes important as the data is separated from some basic information
which allows us to keep these files on x number of spindles or network devices
and managed in a much simpler fashion.

 

We have been looking at imap based on db’s
as their backends and are still in the air on them as they don’t meet all
of our requirements right now (in their stable form) but going forward I think
that SQL emails might become our designed transport.  Our SQL servers for
handling this are clustered machines, each with about 600gb disk space, under
linux-ha and DRBD.  This is also then replicated to a matching offsite
database cluster.

 

I believe that there is a use for a
technology focused more around databases (actually there are some right now
just very specific to themselves and not really configurable) that will replace
existing named systems (such as uw-imap and cyrus).  I would guess that
these tools themselves might start that implementation within themselves (hint
hint) so we don’t have to turn to the alternative imap systems.

 

Anyway, this stuff exists and some of us
use certain concepts already applied.  Implementation is simple in many
cases.

 

 











From: Marc Perkel [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 09, 2006 1:19
PM
To: users@spamassassin.apache.org
Subject: The Future of Email is
SQL



 

After considerable experimenting and thinking things through I thought I'd start a thread on the future of email to start planting the seeds of where MTA development needs to go. I'm convinced that someday soon we will all realize that MBOX and MAILDIR are obsolete technologies and that the future is going to be SQL based storage. First - before everyone starts screaming about speed comparisons, I'm not going to go there. Every storage technology has it's advantages and disadvantages but I'm just going to say that SQL based mail storage is fast enough. The advantages of SQL has to do with power and not with speed. Those who would choose it would do so because they want to do new things that you can do with a database and can't do without one. SQL has several advantages. You don't have t deal with the quirks of the underlying file system or OS. It takes care of all the locking issues and indexing and makes it so that multiple applications can seamlessly access the data. With an SQL backend email can be stored from the MTA, read from and IMAP client that accesses the same database, and the spam filtering engine will have access to the stored email as well. To give you some examples of what could be done . Suppose a spammer sends 1000 phishing spams to your users and then you figure out that the 1000 spams already delivered is spam. With a database you can do a query to retroactively delete spam that was already delivered to the mailboxes. This could also be used to retroactively delete viruses already delivered. Spam filtering programs can lookup existing email in existing folders and compare it with new email already deliverd to help determine more accurately if a message is spam or not. For example, if the host server has a reputation for 100% ham then it can deliver new email without running it through Spam Assassin. If programs like Spamassassin can access existing email in existing folders it can evaluate new email using tricks no one has yet considered. SQL databases allow for multiple masters and slaves and replication that lets you create a cluster that never fails under any conditions. It would be far easier to create a system that is always on and always backed up. An SQL backend allows you to use a wide variety of tools, programming languages, operating systems in order for you to easily integrate more easily than non database systems. And - this is important - once you have a database then new things that no one has yet thought of will be possible and new things we've never heard of will be developed becaus

Re: The Future of Email is SQL

2006-06-09 Thread Logan Shaw

On Fri, 9 Jun 2006, Marc Perkel wrote:

 wrote:


Because I am an SQL dummy, I do have this question.  Would aps like Mysql 
and Postgres be able to handle 10,000+ users with an average of 50 MB of 
email?  I really don't know.

 Also, does the body just get written to a table?


That would be about 500 gigs of email. Fry's Electronics has drives that size 
on special for $189. So - I'd say yes, should be fairly easy to scale up to 
that size and beyond.


That seems like a red herring considering 500GB of e-mail
still takes about 500GB of disk space whether it's a database
engine or the kernel's filesystem driver writing it to disk.
Yes, there will be differences in storage efficiency, but
they're minor.

And to answer 's question, yeah, the body would be written
to a table.  Because when you store things in a SQL database,
everything you store is written into a table.  For what it's
worth, you can store stuff like e-mail in a BLOB (Binary Large
OBject) or a similar type of field that is specifically meant
to be able to handle data of arbitrary length.

  - Logan


Re: The Future of Email is SQL - What drives do you use?

2006-06-09 Thread Jason Marshall

OK, I'm sorry for changing the subject but I have had good results with 18 and 
36 GB IBM SCSI
drives.

What do you use?


I generally use Seagate.  Used to use IBM/Hitachi and Fujitsu.  Still 
would if they were easier to find in stock around here.  Have used 
Quantums, and long long ago Micropolis.  Both should be avoided...


Found a pile of new 4 gig SCSI disks on ebay, and have been using those 
for linux system disks for the last couple years.  They're SCSI 2, great 
for booting from, they last forever...  Ah, remember the good old days 
when you could buy a disk that was the right size for the job, not 1400x 
bigger than what you need...


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
| Jason Marshall, [EMAIL PROTECTED] Spots InterConnect, Inc. Calgary, AB |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-


Re: The Future of Email is SQL - What drives do you use?

2006-06-09 Thread qqqq

| Between two mail gateways and three toasters we have 14 disks that never
| stop seeking, never, 24/7/365. A consumer grade storage device would
| scream "mommy" and wet itself.
|
| DAve

OK, I'm sorry for changing the subject but I have had good results with 18 and 
36 GB IBM SCSI
drives.

What do you use?





Re: The Future of Email is SQL

2006-06-09 Thread DAve

Marc Perkel wrote:



 wrote:


My point here is - think outside the box. I'm going to be lobbying IMAP
server developers to include SQL backends. exim could pipe data into a
local delivery agent, or it can have features written to write directly
to the SQL backend.

Thoughts . ?


We are looking at using DBMail as a mail archive for clients to pull 
lost and historic mail at a later date. But not for daily use.


Because I am an SQL dummy, I do have this question.  Would aps like 
Mysql and Postgres be able to handle 10,000+ users with an average of 
50 MB of email?  
I really don't know.
 
Also, does the body just get written to a table?
 
Enlighten me,
 

That would be about 500 gigs of email. Fry's Electronics has drives that 
size on special for $189. So - I'd say yes, should be fairly easy to 
scale up to that size and beyond.


If you are building a high performance mail server, even just a 
mailstore, you aren't buying hardware at Fry's. Our mail gateways take 
in 2gb of messages a day for 6k+ accounts. That is after we have 
rejected 70% of the connections.


Between two mail gateways and three toasters we have 14 disks that never 
stop seeking, never, 24/7/365. A consumer grade storage device would 
scream "mommy" and wet itself.


DAve

--
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?

Maybe they forgot who made that choice possible.


Re: The Future of Email is SQL

2006-06-09 Thread qqqq



>>That would be about 500 gigs of email. Fry's Electronics has drives 
that size on special for $189. So - I'd say yes, should be fairly easy to scale 
up to that size and beyond.
 
I believe it would be approx 200 Gigs
 



Re: The Future of Email is SQL

2006-06-09 Thread Jason Marshall
That would be about 500 gigs of email. Fry's Electronics has drives that size 
on special for $189. So - I'd say yes, should be fairly easy to scale up to 
that size and beyond.


You really think one 500 gig disk is going to give you anywhere close to 
the performance you need to accomodate 500 active gigs of mailboxes?  If 
you had 30x 18 gig fibre-channel drives spread out over many controllers 
and many machines you might have half a chance of keeping up.


If you put, say, 5 of those disks on each of 6 servers, you'd still need a 
way to (reliably) aggregate those 30 disks into one large storage 
facility, and the database engine on all 6 of those servers would have to 
be aware that it was just 1/6 of the equation at all times.


There would be no redundancy at that point.  You could probably get a 7th 
server with a bunch of large slow disks and use that to back up the data 
on the real cluster.  But you'd never be able to move 'production' over to 
the backup server, there'd just be too little disk throughput for it to 
work.


And by the way, 10k users isn't a lot.  I have 3000 users, and I'd 
consider us to be a miniscule operation compared to many others out there. 
Scale this to 250k users and we're talking...


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
| Jason Marshall, [EMAIL PROTECTED] Spots InterConnect, Inc. Calgary, AB |
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-


Re: The Future of Email is SQL

2006-06-09 Thread Marc Perkel






 wrote:

  
  
  
  
My point here is - think outside the box. I'm going to be lobbying IMAP
  
server developers to include SQL backends. exim could pipe data into a 
local delivery agent, or it can have features written to write directly
  
to the SQL backend.
  
Thoughts . ?
  
  Because I am an SQL dummy, I do have this
question.  Would aps like Mysql and Postgres be able to handle 10,000+
users with an average of 50 MB of email?  
   
  I really don't know.
   
  Also, does the body just get written to a table?
   
  Enlighten me,
   
  

That would be about 500 gigs of email. Fry's Electronics has drives
that size on special for $189. So - I'd say yes, should be fairly easy
to scale up to that size and beyond.






Re: The Future of Email is SQL

2006-06-09 Thread qqqq



My point here is - think outside the 
box. I'm going to be lobbying IMAP server developers to include SQL 
backends. exim could pipe data into a local delivery agent, or it can have 
features written to write directly to the SQL backend.Thoughts . 
?
Because I am an SQL dummy, I do have this question.  
Would aps like Mysql and Postgres be able to handle 10,000+ users with an 
average of 50 MB of email?  
 
I really don't know.
 
Also, does the body just get written to a table?
 
Enlighten me,