Re: [Dbmail] Optimizing Dbmail Database

2009-12-11 Thread Paul J Stevens
Michael Monnerie wrote:
 On Donnerstag, 10. Dezember 2009 Tomas Kuliavas wrote:
 DBMail might find its niche in some setups, but large mailboxes are
  not in that niche. 750 GB DB proves it. You can't do text search raw
  email sources. There is no point of storing them in DB.
 
 And you believe doing a raw text search on a 750GB flat file mailserver 
 would be fast?

Raw text searches are not your typical usage pattern. Doing so in a
truly high speed fashion is a principle goal for all imap
implementations. For dbmail, using an external full text indexes such as
solr/lucene would be the most logical (and scalable) solution.

 
 dbmail 2.3 is different in that it stores mimeparts separately. Maybe a 
 full text search skips binary attachments there. Paul?

Currently, a full body text search will do a full table scan of the
mimeparts table and pull in all mimeparts part of the messages in the
mailbox being searched. If we want to skip all non text/* mimeparts (as
allowed by the imap rfc), we'd have to add some knowledge of the
mimetype contained in the mimepart. Doing so would be trivial. And so
would fixing the query be that does the search.

-- 
  
  Paul Stevens  paul at nfg.nl
  NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31
  The Netherlandshttp://www.nfg.nl
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-11 Thread Michael Monnerie
On Freitag, 11. Dezember 2009 Josh Marshall wrote:
 I have found that since linux kernel 2.6 series, LVM snapshots have
 caused system lockups. I used it happily in the 2.4 series. 

That's why LVM snapshots are not used in XenServer 5.x. They also said 
it's unstable, especially under high load.

 Besides
 that, I did mention *impact-free*. Adding a snapshot and reading from
  a snapshot severely impacts the speed of the running system.

I totally agree with your arguing. Having all together is much easier to 
administer. Once it's too slow, I'll throw in more hardware. It's 
cheaper to throw in a new server than to have the extra burden with 
redundancy, backup/restore, etc...

So far, I haven't seen a limit on dbmail, while we had limits with older 
POP-only systems before, where users had the setting leave mail on 
server. The server had to copy the flat file all over again for each 
user, I/O stalled...

BTW: we upgraded from PostgreSQL 8.1 to 8.3, which exactly *doubled* the 
speed of our nightly backups and vacuum/cluster runs. So that was a nice 
step which I can recommend to everybody. I wonder if 8.4 will bring 
another improvement.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660 / 415 6531   .network.your.ideas.
//
// Wir haben zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://willhaben.at/iad/realestate/object?adId=15306857


signature.asc
Description: This is a digitally signed message part.
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-11 Thread Michael Monnerie
On Freitag, 11. Dezember 2009 Daniel Urstöger wrote:
 Well, one can also do that with a filesystem based storage, you
  just   need something similar to the MySQL replication for flat
  files. DRDB for example.
 
DRBD puts a burden on the server all the time. For a secure replication 
you need to wait until the I/O on the remote server is on disk too. Only 
if you relax that, and allow buffered I/O to the remote, the impact is 
negligible. But then you risk a munged DB in case your first machine 
brutally crashes during high I/O, and suddenly you loose some parts of 
your transactions which the DB does not expect. It's not nice, because 
the DB claims everything went OK, while some data in some tables is 
wrong...

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660 / 415 6531   .network.your.ideas.
//
// Wir haben zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://willhaben.at/iad/realestate/object?adId=15306857


signature.asc
Description: This is a digitally signed message part.
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-11 Thread Michael Monnerie
On Freitag, 11. Dezember 2009 Michael Monnerie wrote:
 LVM snapshots
 
Another thing to remember: You can only do a snapshot of a single 
filesystem at a time. So if you have your DB and attachments in 
different volumes, snapshots are not transactions anymore. Some people 
may be happy to live with that, though.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660 / 415 6531   .network.your.ideas.
//
// Wir haben zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://willhaben.at/iad/realestate/object?adId=15306857


signature.asc
Description: This is a digitally signed message part.
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-11 Thread Michael Monnerie
On Freitag, 11. Dezember 2009 Paul J Stevens wrote:
 Currently, a full body text search will do a full table scan of the
 mimeparts table and pull in all mimeparts part of the messages in the
 mailbox being searched. If we want to skip all non text/* mimeparts
  (as allowed by the imap rfc), we'd have to add some knowledge of the
  mimetype contained in the mimepart. Doing so would be trivial. And
  so would fixing the query be that does the search.
 
Sounds like a nice-to-have feature :-)
That would be a great reason to upgrade to 2.3.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660 / 415 6531   .network.your.ideas.
//
// Wir haben zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://willhaben.at/iad/realestate/object?adId=15306857


signature.asc
Description: This is a digitally signed message part.
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-11 Thread Michael Monnerie
On Freitag, 11. Dezember 2009 Tomas Kuliavas wrote:
 Emails are not raw text. There are at least two ways to write test
  in email and if you go to 8bit text, number of same text variations
  multiplies. SQL can't search emails stored in DB, because SQL does
  not know about encodings, mime formats and character sets

So where's the difference? You can
SELECT * ... WHERE mailtext LIKE 'test'::utf8 OR mailtext LIKE 
'test'::base64  etc.
and a flat file server would do the same anyway. The e-mail is stored in 
original format, so it would also search for test in all encodings.

The question is anyway: Does an IMAP SEARCH search in several variations 
of test? What if it's base64 encoded?

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660 / 415 6531   .network.your.ideas.
//
// Wir haben zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://willhaben.at/iad/realestate/object?adId=15306857


signature.asc
Description: This is a digitally signed message part.
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-11 Thread Tomas Kuliavas
2009.12.11 13:14 Michael Monnerie rašė:
 On Freitag, 11. Dezember 2009 Tomas Kuliavas wrote:
 Emails are not raw text. There are at least two ways to write test
  in email and if you go to 8bit text, number of same text variations
  multiplies. SQL can't search emails stored in DB, because SQL does
  not know about encodings, mime formats and character sets

 So where's the difference? You can
 SELECT * ... WHERE mailtext LIKE 'test'::utf8 OR mailtext LIKE
 'test'::base64  etc.
 and a flat file server would do the same anyway. The e-mail is stored in
 original format, so it would also search for test in all encodings.

Are you sure that syntax of your select query is correct?

how complex select call you will make in order to cover all variations?
flowed format, quoted-printable, headers and body that might have text in
n different charsets.

SQL is not designed to decode MIME on the fly.

 The question is anyway: Does an IMAP SEARCH search in several variations
 of test? What if it's base64 encoded?

Headers must be decoded, if charset is specified in search command. You
are free to read all IMAP stardards if you want as long as you don't
invent new SQL syntax in order to prove your point.

Glad to see that Daniel got suggestions to his problem. Maybe size of
database can be reduced by moving some accounts to other server?


-- 
Tomas

___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-11 Thread Michael Monnerie
On Freitag, 11. Dezember 2009 Tomas Kuliavas wrote:
 Are you sure that syntax of your select query is correct?

No, that was pseudo code to demonstrate you can search for variations 
within one query.
 
 how complex select call you will make in order to cover all
  variations? flowed format, quoted-printable, headers and body that
  might have text in n different charsets.

Just exactly the same amount a server with flat files would have to. 
There's no difference.

 SQL is not designed to decode MIME on the fly.

It all reduces to search for a certain byte combination. You just have 
to encode your search string to all variations you need, and put all 
those in a single SELECT. That was my point.

In case you have to decode the mail, you need to retrieve,decode,search, 
and still this is the same work a flat file mailserver would do. All 
this discussion is about the speed of searching, and I didn't see an 
example where a flat file server could search faster than the DB so far. 

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660 / 415 6531   .network.your.ideas.
//
// Wir haben zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://willhaben.at/iad/realestate/object?adId=15306857


signature.asc
Description: This is a digitally signed message part.
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Tomas Kuliavas
Correct solution is not to store email data in DB. I think any sane DBA
could say you that. Don't store binary data in DB.

In large mailbox setups you get best performance by storing emails in
filesystem (one email per file in hashed directory structure) and caching
email headers in DB.

2009.12.10 19:59 Blurry rašė:
 Guys, I really2 need help..

 Are there anyone out there with suggestions or anything at all that might
 help? Please help.

 Thanx.

 Sent via BlackBerry Storm from Maxis

 -Original Message-
 From: dbmail-requ...@dbmail.org
 Date: Mon, 07 Dec 2009 12:00:01
 To: dbmail@dbmail.org
 Subject: DBmail Digest, Vol 69, Issue 3

 Send DBmail mailing list submissions to
   dbmail@dbmail.org

 To subscribe or unsubscribe via the World Wide Web, visit
   http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
 or, via email, send a message with subject or body 'help' to
   dbmail-requ...@dbmail.org

 You can reach the person managing the list at
   dbmail-ow...@dbmail.org

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of DBmail digest...


 Today's Topics:

1.  Optimizing Dbmail Database (Daniel Mejia)
2. Re: Optimizing Dbmail Database (Daniel Mejia)
3. Re: Optimizing Dbmail Database (Josh Marshall)
4. Re: Optimizing Dbmail Database (Daniel Mejia)


___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Reindl Harald

Am 10.12.2009 20:23, schrieb Tomas Kuliavas:
 Correct solution is not to store email data in DB. I think any sane DBA
 could say you that. Don't store binary data in DB.

What stupid statement in context of DBmail




signature.asc
Description: OpenPGP digital signature
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Tomas Kuliavas
2009.12.10 21:55 Reindl Harald rašė:

 Am 10.12.2009 20:23, schrieb Tomas Kuliavas:
 Correct solution is not to store email data in DB. I think any sane DBA
 could say you that. Don't store binary data in DB.

 What stupid statement in context of DBmail

I am not DBmail developer or user. Maybe I am wrong and raw emails are not
stored by DBmail in DB. I haven't tested DBmail performance on larger
mailboxes only because the only DBmail setup I have runs virtual host.
Performance test on virtual host would be unfair considering that other
servers are tested on real machine.

DBMail might find its niche in some setups, but large mailboxes are not in
that niche. 750 GB DB proves it. You can't do text search raw email
sources. There is no point of storing them in DB.

-- 
Tomas

___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread tabris
Reindl Harald wrote:
 Am 10.12.2009 20:23, schrieb Tomas Kuliavas:
   
 Correct solution is not to store email data in DB. I think any sane DBA
 could say you that. Don't store binary data in DB.
 

 What stupid statement in context of DBmail
Incidentally, if one were to do as he suggests, it would make replicated
setups more complicated! Which, incidentally, the replication is why I
chose dbmail in the first place.



signature.asc
Description: OpenPGP digital signature
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread tabris
Reindl Harald wrote:
 Am 10.12.2009 20:23, schrieb Tomas Kuliavas:
   
 Correct solution is not to store email data in DB. I think any sane DBA
 could say you that. Don't store binary data in DB.
 

 What stupid statement in context of DBmail
   
Only because dbmail already does hte opposite. Frankly, I don't think it
would be such a _terrible_ idea to design dbmail to keep the
dbmail_mimeparts as files, rather than part of the database. But I
certainly don't expect that change to come anytime soon, if ever.



signature.asc
Description: OpenPGP digital signature
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Alan Hodgson
On Thursday 10 December 2009, Tomas Kuliavas to...@users.sourceforge.net 
wrote:
 DBMail might find its niche in some setups, but large mailboxes are not
 in that niche. 750 GB DB proves it. You can't do text search raw email
 sources. There is no point of storing them in DB.

DBMail does store the email in the database, and it works fine. Some things 
are slower than the alternatives, some things are faster (like backups, 
which happen, you know, a lot).

It certainly has it's niche and it does a fine job of it.

-- 
No animals were harmed in the recording of this episode. We tried but that 
damn monkey was just too fast.
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Josh Marshall
I'd like to point out a few things:

* The added complexity of storing and synchronising files on disk with
records in tables, especially in a load-balanced, high available
situation, is much more work than any returns you'll ever get.

* Whether the emails are stored in the database or the filesystem (which
is really just a database) is not going to be that much difference.
Databases can be a bit inefficient for space at times but this is
usually to increase speed.

* I'd like to see 3 or 4 mailservers performing imap searches over an
NFS share to get to the mailbox files or messages. Then we can really
compare speed of the database vs filesystem in a networked environment

* I'd like to see a system administrator easily recover all the emails
for a mailbox since the last time cleanup was performed. Hint: 
update dbmail_messages set deleted_flag=0,status=0 where mailbox_idnr in
(select mailbox_idnr from dbmail_mailboxes where owner_idnr IN (SELECT
user_idnr from dbmail_users where userid='mail...@userdomain.com'));

* I'd like to see fine-grained point-in-time recovery for the
filesystem-based (or hybrid - scary) systems. Yes it would take a while
for any system depending on mail size.

* I'd like to see impact-free daily backups for filesystem-based
systems. With dbmail, just have a slave replica you can pause
replication on to get a perfect snapshot, with no impact on the live
database during the backup duration.

* Remember that with any mail system that has a huge amount of data,
things are going to take time. Databases have more records to search
through (although indexes can help speed this up). mbox are basically a
crude database storing all the emails in one file so large mailboxes can
take a very long time to work with. Maildir is good until the inbox gets
so many small files that just the directory listing takes a long time.
If you're going to have a large mail system, be aware that things will
take time, or use multiple systems and a system like perdition to split
up the mailboxes, or have an archive system for users to place old
emails they want to keep in.

* As for mail delivery speed statistics, take them all with a grain of
salt. Our experience is the bottlneck for inbound mail is the antivirus
and antispam stage, and with the huge amount of spam hitting our servers
(90+% of all connections) it is actually faster to detect and reject
spam than have the mail deliver into the mailboxes.

Finally:

* Mail systems that don't require high-availability, failover or
networked environments for load balancing would probably be better to
just use mbox or maildir, for simplicity. For mission-critical systems
there are more items to consider before deciding which option to take.

Josh

On Thu, 2009-12-10 at 12:03 -0800, tabris wrote:
 Reindl Harald wrote:
  Am 10.12.2009 20:23, schrieb Tomas Kuliavas:

  Correct solution is not to store email data in DB. I think any sane DBA
  could say you that. Don't store binary data in DB.
  
 
  What stupid statement in context of DBmail

 Only because dbmail already does hte opposite. Frankly, I don't think it
 would be such a _terrible_ idea to design dbmail to keep the
 dbmail_mimeparts as files, rather than part of the database. But I
 certainly don't expect that change to come anytime soon, if ever.


___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread tabris
Josh Marshall wrote:
 I'd like to point out a few things:

 * The added complexity of storing and synchronising files on disk with
 records in tables, especially in a load-balanced, high available
 situation, is much more work than any returns you'll ever get.

 * Whether the emails are stored in the database or the filesystem (which
 is really just a database) is not going to be that much difference.
 Databases can be a bit inefficient for space at times but this is
 usually to increase speed.

 * I'd like to see 3 or 4 mailservers performing imap searches over an
 NFS share to get to the mailbox files or messages. Then we can really
 compare speed of the database vs filesystem in a networked environment

 * I'd like to see a system administrator easily recover all the emails
 for a mailbox since the last time cleanup was performed. Hint: 
 update dbmail_messages set deleted_flag=0,status=0 where mailbox_idnr in
 (select mailbox_idnr from dbmail_mailboxes where owner_idnr IN (SELECT
 user_idnr from dbmail_users where userid='mail...@userdomain.com'));

 * I'd like to see fine-grained point-in-time recovery for the
 filesystem-based (or hybrid - scary) systems. Yes it would take a while
 for any system depending on mail size.

 * I'd like to see impact-free daily backups for filesystem-based
 systems. With dbmail, just have a slave replica you can pause
 replication on to get a perfect snapshot, with no impact on the live
 database during the backup duration.

 * Remember that with any mail system that has a huge amount of data,
 things are going to take time. Databases have more records to search
 through (although indexes can help speed this up). mbox are basically a
 crude database storing all the emails in one file so large mailboxes can
 take a very long time to work with. Maildir is good until the inbox gets
 so many small files that just the directory listing takes a long time.
 If you're going to have a large mail system, be aware that things will
 take time, or use multiple systems and a system like perdition to split
 up the mailboxes, or have an archive system for users to place old
 emails they want to keep in.

 * As for mail delivery speed statistics, take them all with a grain of
 salt. Our experience is the bottlneck for inbound mail is the antivirus
 and antispam stage, and with the huge amount of spam hitting our servers
 (90+% of all connections) it is actually faster to detect and reject
 spam than have the mail deliver into the mailboxes.

 Finally:

 * Mail systems that don't require high-availability, failover or
 networked environments for load balancing would probably be better to
 just use mbox or maildir, for simplicity. For mission-critical systems
 there are more items to consider before deciding which option to take.
For what it's worth, I setup dbmail for my employer, and the only
reason I chose it was that it was able to handle replication. Admittedly
I abuse the system a little to do a master-master replication, and
that over the Atlantic Ocean. Current database size is 100G, 95G of
which is the dbmail_messageblks table.

So yes, I know that there are advantages to this, and that there are
major upsides to a database. And I certainly wasn't suggesting NFS
(nightmare). But it would be interesting if we could have a replication
agent that could push mimeparts to disc. And fwiw, I didn't want to put
30,000 files into one folder, and was not recommending the use of
maildir. I more expected something like what Squid does, with 256x256
folders with mimeparts inside, indexed from their md5 or sha256 hashes.

At the same time, I'm not sure that that kind of replication is
practical. Maybe we need to instead make a MySQL engine that puts blobs
into files. But that's rather offtopic for the dbmail list.



signature.asc
Description: OpenPGP digital signature
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Michael Monnerie
On Donnerstag, 10. Dezember 2009 Tomas Kuliavas wrote:
 DBMail might find its niche in some setups, but large mailboxes are
  not in that niche. 750 GB DB proves it. You can't do text search raw
  email sources. There is no point of storing them in DB.

And you believe doing a raw text search on a 750GB flat file mailserver 
would be fast?

dbmail 2.3 is different in that it stores mimeparts separately. Maybe a 
full text search skips binary attachments there. Paul?

What I'd like to know from Daniel: Do you have 750G of real data, or is 
that just your DB size. It seems your setup is not optimized at all, a 
lost connection shows your server can't keep up with the load. Maybe all 
mysql Parameters need tuning anyway.

dbmail heavily depends on a good DBA to give good performance. Once you 
have more than 10GB and 100+ users you see the difference.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660 / 415 6531   .network.your.ideas.
//
// Wir haben zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://willhaben.at/iad/realestate/object?adId=15306857


signature.asc
Description: This is a digitally signed message part.
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Daniel Urstöger
I do not want to add to this quite hot situation, but there are two things 
worth mentioning:

 * I'd like to see impact-free daily backups for filesystem-based
 systems. With dbmail, just have a slave replica you can pause
 replication on to get a perfect snapshot, with no impact on the live
 database during the backup duration.

That is actually possible, not with the same features, but one could use the 
snapshot features from LVM to achieve that.
Create and mount that snapshot on your backup box and well, do with it whatever 
you like.

The other thing I think is worth mentioning is especially about MySQL: the Full 
Text Index ( FTI ) is quite bad for searches,
if you reach a certain amount of data, also looking through all the records 
without any index is quite slow. 
I have no comparison of flat file storage compared with database stored 
messages, but for MySQL there is soon to be a new search / index technology 
available,
which hopefully will also get implemented in dbmail (?), called sphinx search. 
I have used it lately (beta version) in a project and the speed compared to 
MySQL with FTI was quite remarkable.

Kind regards,
Daniel
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Michael Monnerie
On Freitag, 11. Dezember 2009 Daniel Urstöger wrote:
 the Full Text Index ( FTI ) is quite bad for searches
 
dbmail doesn't use FTI.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660 / 415 6531   .network.your.ideas.
//
// Wir haben zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://willhaben.at/iad/realestate/object?adId=15306857


signature.asc
Description: This is a digitally signed message part.
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Josh Marshall
Hi,

I didn't think it was hot. I have seen this argument a few times
before, comparing apples and oranges and suggesting that dbmail does it
the wrong way and should change. Yes dbmail has a niche and it excels in
situations where filesystem-based mail systems can't cut it, and that's
why I'm using it. For those who think filesystem-based mail systems are
better for them, I say go for it! Not everyone has the same
requirements.

To address your two points though:

I have found that since linux kernel 2.6 series, LVM snapshots have
caused system lockups. I used it happily in the 2.4 series. Besides
that, I did mention *impact-free*. Adding a snapshot and reading from a
snapshot severely impacts the speed of the running system. Yes you get a
clean backup, but the hard disk is being placed under a huge read
strain, not to mention the extra COW load for every write to the
filesystem.

I believe that dbmail on MySQL requires the use of InnoDB, which I
believe (or has this recently changed?) does not support Full Text
Index. Maybe using something like Sphinx as a bolt-on would be handy for
doing IMAP searches. I generally sync all my emails to my desktop
machine and do any searches on the local copies. Then searches don't
impact the servers :)

On Fri, 2009-12-11 at 01:00 +0100, Daniel Urstöger wrote:
 I do not want to add to this quite hot situation, but there are two things 
 worth mentioning:
 
  * I'd like to see impact-free daily backups for filesystem-based
  systems. With dbmail, just have a slave replica you can pause
  replication on to get a perfect snapshot, with no impact on the live
  database during the backup duration.
 
 That is actually possible, not with the same features, but one could use the 
 snapshot features from LVM to achieve that.
 Create and mount that snapshot on your backup box and well, do with it 
 whatever you like.
 
 The other thing I think is worth mentioning is especially about MySQL: the 
 Full Text Index ( FTI ) is quite bad for searches,
 if you reach a certain amount of data, also looking through all the records 
 without any index is quite slow. 
 I have no comparison of flat file storage compared with database stored 
 messages, but for MySQL there is soon to be a new search / index technology 
 available,
 which hopefully will also get implemented in dbmail (?), called sphinx 
 search. 
 I have used it lately (beta version) in a project and the speed compared to 
 MySQL with FTI was quite remarkable.


___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Daniel Urstöger
Never claimed it does ;)

Just saying that for bigger datasets it becomes more and more useless  
but with Sphinx it remains blazing fast. So, for dbmail that would be  
a nice to have feature ...

Am 11.12.2009 um 01:07 schrieb Michael Monnerie 
michael.monne...@is.it-management.at 
 :

 On Freitag, 11. Dezember 2009 Daniel Urstöger wrote:
 the Full Text Index ( FTI ) is quite bad for searches

 dbmail doesn't use FTI.

 mfg zmi
 -- 
 // Michael Monnerie, Ing.BSc-  http://it-management.at
 // Tel: 0660 / 415 6531   .network.your.ideas.
 //
 // Wir haben zwei Häuser zu verkaufen:
 // http://zmi.at/langegg/
 // http://willhaben.at/iad/realestate/object?adId=15306857
 ___
 DBmail mailing list
 DBmail@dbmail.org
 http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Daniel Urstöger
Oranges and Apples, I agree to that. I happily use dbmail as well as  
my qmail/vpopmail setup. Every system has it quirks and shortcommings.


 To address your two points though:

 I have found that since linux kernel 2.6 series, LVM snapshots have
 caused system lockups. I used it happily in the 2.4 series. Besides
 that, I did mention *impact-free*. Adding a snapshot and reading  
 from a
 snapshot severely impacts the speed of the running system. Yes you  
 get a
 clean backup, but the hard disk is being placed under a huge read
 strain, not to mention the extra COW load for every write to the
 filesystem.

Yes, but after backing to the snapshot to some place one can remove it  
and the speed will be back to normal. So, running a db slave and using  
mysqldump for backups is not much different.

 I believe that dbmail on MySQL requires the use of InnoDB, which I
 believe (or has this recently changed?) does not support Full Text
 Index. Maybe using something like Sphinx as a bolt-on would be handy  
 for
 doing IMAP searches. I generally sync all my emails to my desktop
 machine and do any searches on the local copies. Then searches don't
 impact the servers :)

 On Fri, 2009-12-11 at 01:00 +0100, Daniel Urstöger wrote:
 I do not want to add to this quite hot situation, but there are  
 two things worth mentioning:

 * I'd like to see impact-free daily backups for filesystem-based
 systems. With dbmail, just have a slave replica you can pause
 replication on to get a perfect snapshot, with no impact on the live
 database during the backup duration.

 That is actually possible, not with the same features, but one  
 could use the snapshot features from LVM to achieve that.
 Create and mount that snapshot on your backup box and well, do with  
 it whatever you like.

 The other thing I think is worth mentioning is especially about  
 MySQL: the Full Text Index ( FTI ) is quite bad for searches,
 if you reach a certain amount of data, also looking through all the  
 records without any index is quite slow.
 I have no comparison of flat file storage compared with database  
 stored messages, but for MySQL there is soon to be a new search /  
 index technology available,
 which hopefully will also get implemented in dbmail (?), called  
 sphinx search.
 I have used it lately (beta version) in a project and the speed  
 compared to MySQL with FTI was quite remarkable.


 ___
 DBmail mailing list
 DBmail@dbmail.org
 http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Josh Marshall
On Fri, 2009-12-11 at 01:38 +0100, Daniel Urstöger wrote:
 Yes, but after backing to the snapshot to some place one can remove
 it  
 and the speed will be back to normal. So, running a db slave and
 using  
 mysqldump for backups is not much different.

Not quite. Having a separate slave database server to do the heavy work
of backups has no impact on the master database during the backup
period. Therefore the master database is always at normal speed.

___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Reindl Harald
Am 11.12.2009 01:38, schrieb Daniel Urstöger dan...@gosi.at:

 Yes, but after backing to the snapshot to some place one can remove it  
 and the speed will be back to normal. So, running a db slave and using  
 mysqldump for backups is not much different.

a) the slave can yun on one or more other physical machines
b) shut down the slave, run rsync, start the slave

There is no moment you have more load on the master



signature.asc
Description: OpenPGP digital signature
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Daniel Urstöger
My message was not quite finished, sadly there isnt an App for that ;)

 I have found that since linux kernel 2.6 series, LVM snapshots have
 caused system lockups. I used it happily in the 2.4 series. Besides
 that, I did mention *impact-free*. Adding a snapshot and reading  
 from a
 snapshot severely impacts the speed of the running system. Yes you  
 get a
 clean backup, but the hard disk is being placed under a huge read
 strain, not to mention the extra COW load for every write to the
 filesystem.

Which distro are you using? I havent had any snapshot related problems  
yet, but maybe that is related to how much data you have? The  
snapshots I create rately have more then 20GB of data.
And I don't want to point out a better distro for you but I see this  
mailing list as quite a nice source of knowledge and of exchange ...  
So I just want to know to learn from :)

 I believe that dbmail on MySQL requires the use of InnoDB, which I
 believe (or has this recently changed?) does not support Full Text
 Index. Maybe using something like Sphinx as a bolt-on would be handy  
 for
 doing IMAP searches. I generally sync all my emails to my desktop
 machine and do any searches on the local copies. Then searches don't
 impact the servers :)

It does require transactions, so for MySQL you are quite tied to  
innodb, unless you want to try something more experimental.

It would be nice of customers to act like you, but usually they don't.  
Though  I really liked that Thunderbird 3 now even heavily suggest  
syncing to local disk.

Sphinx for searches would be awesome! The new release is even  
compatible to the MySQL client / libraries. So no API to fiddle with,  
but still needs quite some work, I guess ...




___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Daniel Urstöger

 Not quite. Having a separate slave database server to do the heavy  
 work
 of backups has no impact on the master database during the backup
 period. Therefore the master database is always at normal speed.

Well, one can also do that with a filesystem based storage, you just  
need something similar to the MySQL replication for flat files. DRDB  
for example.
___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Josh Marshall
On Fri, 2009-12-11 at 02:01 +0100, Daniel Urstöger wrote:
  Not quite. Having a separate slave database server to do the heavy  
  work
  of backups has no impact on the master database during the backup
  period. Therefore the master database is always at normal speed.
 
 Well, one can also do that with a filesystem based storage, you just  
 need something similar to the MySQL replication for flat files. DRDB  
 for example.

I have also had DRBD in production, sharing in a similar way. However
when the DRBD reconnects it needs to scan through all the changes in the
master disk to find and copy across all the changed sectors. So there is
a performance hit when the copy completes. It depends on the number of
changed sectors to how much of a hit this will be. In most cases it
would be minor, so I am being picky here. One problem I did have with
DRBD (used 0.7 series) is that I would have the system lock me out if
only one side came up, so I was completely without service until the
timeout or I interfered with it to manually switch to the appropriate
master. They may have fixed this with the 0.8 series.

You could argue that with the mysql binary log there is a performance
hit when the copy completes and the slave reconnects, but if the binary
logs are on a separate disk spindle, the don't affect the performance of
the main database files.

___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-10 Thread Tomas Kuliavas
2009.12.11 01:33 Michael Monnerie rašė:
 On Donnerstag, 10. Dezember 2009 Tomas Kuliavas wrote:
 DBMail might find its niche in some setups, but large mailboxes are
  not in that niche. 750 GB DB proves it. You can't do text search raw
  email sources. There is no point of storing them in DB.

 And you believe doing a raw text search on a 750GB flat file mailserver
 would be fast?

Emails are not raw text. There are at least two ways to write test in
email and if you go to 8bit text, number of same text variations
multiplies. SQL can't search emails stored in DB, because SQL does not
know about encodings, mime formats and character sets

-- 
Tomas


___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


[Dbmail] Optimizing Dbmail Database

2009-12-06 Thread Daniel Mejia

I was thinking to run the OPTIMIZE TABLE command in the dbmail database, bcoz
right now the database size of my organization dbmail database is currently
around 750 GB.

Another reason was that my dbmail-util -ay command (which runs on 2am and
6am everyday) keeps giving me this error:

[code]
Dec 07 03:00:37 mailadmin.mpob.g lt-dbmail-util[29046]: Error:[sql]
dbmysql.c,db_query(+290): [Lost connection to MySQL server during query]
[SELECT MIN(messageblk_idnr),MAX(is_header) FROM dbmail_messageblks GROUP BY
physmessage_id HAVING MAX(is_header)=0]
Dec 07 03:00:37 mailadmin.mpob.g lt-dbmail-util[29046]: Error:[db]
db.c,db_icheck_isheader(+1788): could not access messageblks table
Failed. An error occured. Please check log.

Maintenance done. Errors were found but not fixed due to failures.
Please check the logs for further details, turning up the trace level as
needed.
[/code]

I am running dbmail 2.2.10, mysql 5.0.45.

My question is: is this a good idea? (to run the OPTIMIZE TABLE command). If
so, why. 

Thanx a lot guys.
-- 
View this message in context: 
http://old.nabble.com/Optimizing-Dbmail-Database-tp26672088p26672088.html
Sent from the dbmail users mailing list archive at Nabble.com.

___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-06 Thread Josh Marshall
On Sun, 2009-12-06 at 20:13 -0800, Daniel Mejia wrote:
 I was thinking to run the OPTIMIZE TABLE command in the dbmail database, bcoz
 right now the database size of my organization dbmail database is currently
 around 750 GB.

Is there much free space in that table? If not, optimize table wont free
any space and probably wont improve access speed etc.

Remember that you need as much free space to optimise as the size of the
table. I have a mysql server that has 20Gb free in the innodb but only
5Gb free on the disk. Since the messageblk table is 80Gb I can't reclaim
that space, but it doesn't make a difference as the innodb engine reuses
the space fairly well.

Note also that to optimise a table means to read it and write it out
with the table locked. A 750Gb database will take a lng time and you
wont be able to write to it for that time.

Josh.

___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail


Re: [Dbmail] Optimizing Dbmail Database

2009-12-06 Thread Daniel Mejia

 Is there much free space in that table? If not, optimize table wont free
 any space and probably wont improve access speed etc.
with the SHOW TABLE STATUS command, i can see that the free space is not
adequate.
i have around 2 TB of free space on the hard disk, but im not so sure about
the table free space.

 Note also that to optimise a table means to read it and write it out
 with the table locked. A 750Gb database will take a lng time and you
 wont be able to write to it for that time.
we are willing to shut down the email server for that maintenance if it
gives the intended result,
which is to free up as much space as possible and at least improve a tiny
bit of access speed if possible.

if OPTIMIZE TABLE is not the ideal solution, what would u recommend me to
do?
the error that pops up in the dbmail-util -ay command really bugs us.
-- 
View this message in context: 
http://old.nabble.com/Optimizing-Dbmail-Database-tp26672088p26672745.html
Sent from the dbmail users mailing list archive at Nabble.com.

___
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail