Re: [Dbmail] Optimizing Dbmail Database
Michael Monnerie wrote: On Donnerstag, 10. Dezember 2009 Tomas Kuliavas wrote: DBMail might find its niche in some setups, but large mailboxes are not in that niche. 750 GB DB proves it. You can't do text search raw email sources. There is no point of storing them in DB. And you believe doing a raw text search on a 750GB flat file mailserver would be fast? Raw text searches are not your typical usage pattern. Doing so in a truly high speed fashion is a principle goal for all imap implementations. For dbmail, using an external full text indexes such as solr/lucene would be the most logical (and scalable) solution. dbmail 2.3 is different in that it stores mimeparts separately. Maybe a full text search skips binary attachments there. Paul? Currently, a full body text search will do a full table scan of the mimeparts table and pull in all mimeparts part of the messages in the mailbox being searched. If we want to skip all non text/* mimeparts (as allowed by the imap rfc), we'd have to add some knowledge of the mimetype contained in the mimepart. Doing so would be trivial. And so would fixing the query be that does the search. -- Paul Stevens paul at nfg.nl NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31 The Netherlandshttp://www.nfg.nl ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
On Freitag, 11. Dezember 2009 Josh Marshall wrote: I have found that since linux kernel 2.6 series, LVM snapshots have caused system lockups. I used it happily in the 2.4 series. That's why LVM snapshots are not used in XenServer 5.x. They also said it's unstable, especially under high load. Besides that, I did mention *impact-free*. Adding a snapshot and reading from a snapshot severely impacts the speed of the running system. I totally agree with your arguing. Having all together is much easier to administer. Once it's too slow, I'll throw in more hardware. It's cheaper to throw in a new server than to have the extra burden with redundancy, backup/restore, etc... So far, I haven't seen a limit on dbmail, while we had limits with older POP-only systems before, where users had the setting leave mail on server. The server had to copy the flat file all over again for each user, I/O stalled... BTW: we upgraded from PostgreSQL 8.1 to 8.3, which exactly *doubled* the speed of our nightly backups and vacuum/cluster runs. So that was a nice step which I can recommend to everybody. I wonder if 8.4 will bring another improvement. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660 / 415 6531 .network.your.ideas. // // Wir haben zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://willhaben.at/iad/realestate/object?adId=15306857 signature.asc Description: This is a digitally signed message part. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
On Freitag, 11. Dezember 2009 Daniel Urstöger wrote: Well, one can also do that with a filesystem based storage, you just need something similar to the MySQL replication for flat files. DRDB for example. DRBD puts a burden on the server all the time. For a secure replication you need to wait until the I/O on the remote server is on disk too. Only if you relax that, and allow buffered I/O to the remote, the impact is negligible. But then you risk a munged DB in case your first machine brutally crashes during high I/O, and suddenly you loose some parts of your transactions which the DB does not expect. It's not nice, because the DB claims everything went OK, while some data in some tables is wrong... mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660 / 415 6531 .network.your.ideas. // // Wir haben zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://willhaben.at/iad/realestate/object?adId=15306857 signature.asc Description: This is a digitally signed message part. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
On Freitag, 11. Dezember 2009 Michael Monnerie wrote: LVM snapshots Another thing to remember: You can only do a snapshot of a single filesystem at a time. So if you have your DB and attachments in different volumes, snapshots are not transactions anymore. Some people may be happy to live with that, though. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660 / 415 6531 .network.your.ideas. // // Wir haben zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://willhaben.at/iad/realestate/object?adId=15306857 signature.asc Description: This is a digitally signed message part. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
On Freitag, 11. Dezember 2009 Paul J Stevens wrote: Currently, a full body text search will do a full table scan of the mimeparts table and pull in all mimeparts part of the messages in the mailbox being searched. If we want to skip all non text/* mimeparts (as allowed by the imap rfc), we'd have to add some knowledge of the mimetype contained in the mimepart. Doing so would be trivial. And so would fixing the query be that does the search. Sounds like a nice-to-have feature :-) That would be a great reason to upgrade to 2.3. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660 / 415 6531 .network.your.ideas. // // Wir haben zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://willhaben.at/iad/realestate/object?adId=15306857 signature.asc Description: This is a digitally signed message part. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
On Freitag, 11. Dezember 2009 Tomas Kuliavas wrote: Emails are not raw text. There are at least two ways to write test in email and if you go to 8bit text, number of same text variations multiplies. SQL can't search emails stored in DB, because SQL does not know about encodings, mime formats and character sets So where's the difference? You can SELECT * ... WHERE mailtext LIKE 'test'::utf8 OR mailtext LIKE 'test'::base64 etc. and a flat file server would do the same anyway. The e-mail is stored in original format, so it would also search for test in all encodings. The question is anyway: Does an IMAP SEARCH search in several variations of test? What if it's base64 encoded? mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660 / 415 6531 .network.your.ideas. // // Wir haben zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://willhaben.at/iad/realestate/object?adId=15306857 signature.asc Description: This is a digitally signed message part. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
2009.12.11 13:14 Michael Monnerie rašė: On Freitag, 11. Dezember 2009 Tomas Kuliavas wrote: Emails are not raw text. There are at least two ways to write test in email and if you go to 8bit text, number of same text variations multiplies. SQL can't search emails stored in DB, because SQL does not know about encodings, mime formats and character sets So where's the difference? You can SELECT * ... WHERE mailtext LIKE 'test'::utf8 OR mailtext LIKE 'test'::base64 etc. and a flat file server would do the same anyway. The e-mail is stored in original format, so it would also search for test in all encodings. Are you sure that syntax of your select query is correct? how complex select call you will make in order to cover all variations? flowed format, quoted-printable, headers and body that might have text in n different charsets. SQL is not designed to decode MIME on the fly. The question is anyway: Does an IMAP SEARCH search in several variations of test? What if it's base64 encoded? Headers must be decoded, if charset is specified in search command. You are free to read all IMAP stardards if you want as long as you don't invent new SQL syntax in order to prove your point. Glad to see that Daniel got suggestions to his problem. Maybe size of database can be reduced by moving some accounts to other server? -- Tomas ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
On Freitag, 11. Dezember 2009 Tomas Kuliavas wrote: Are you sure that syntax of your select query is correct? No, that was pseudo code to demonstrate you can search for variations within one query. how complex select call you will make in order to cover all variations? flowed format, quoted-printable, headers and body that might have text in n different charsets. Just exactly the same amount a server with flat files would have to. There's no difference. SQL is not designed to decode MIME on the fly. It all reduces to search for a certain byte combination. You just have to encode your search string to all variations you need, and put all those in a single SELECT. That was my point. In case you have to decode the mail, you need to retrieve,decode,search, and still this is the same work a flat file mailserver would do. All this discussion is about the speed of searching, and I didn't see an example where a flat file server could search faster than the DB so far. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660 / 415 6531 .network.your.ideas. // // Wir haben zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://willhaben.at/iad/realestate/object?adId=15306857 signature.asc Description: This is a digitally signed message part. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
Correct solution is not to store email data in DB. I think any sane DBA could say you that. Don't store binary data in DB. In large mailbox setups you get best performance by storing emails in filesystem (one email per file in hashed directory structure) and caching email headers in DB. 2009.12.10 19:59 Blurry rašė: Guys, I really2 need help.. Are there anyone out there with suggestions or anything at all that might help? Please help. Thanx. Sent via BlackBerry Storm from Maxis -Original Message- From: dbmail-requ...@dbmail.org Date: Mon, 07 Dec 2009 12:00:01 To: dbmail@dbmail.org Subject: DBmail Digest, Vol 69, Issue 3 Send DBmail mailing list submissions to dbmail@dbmail.org To subscribe or unsubscribe via the World Wide Web, visit http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail or, via email, send a message with subject or body 'help' to dbmail-requ...@dbmail.org You can reach the person managing the list at dbmail-ow...@dbmail.org When replying, please edit your Subject line so it is more specific than Re: Contents of DBmail digest... Today's Topics: 1. Optimizing Dbmail Database (Daniel Mejia) 2. Re: Optimizing Dbmail Database (Daniel Mejia) 3. Re: Optimizing Dbmail Database (Josh Marshall) 4. Re: Optimizing Dbmail Database (Daniel Mejia) ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
Am 10.12.2009 20:23, schrieb Tomas Kuliavas: Correct solution is not to store email data in DB. I think any sane DBA could say you that. Don't store binary data in DB. What stupid statement in context of DBmail signature.asc Description: OpenPGP digital signature ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
2009.12.10 21:55 Reindl Harald rašė: Am 10.12.2009 20:23, schrieb Tomas Kuliavas: Correct solution is not to store email data in DB. I think any sane DBA could say you that. Don't store binary data in DB. What stupid statement in context of DBmail I am not DBmail developer or user. Maybe I am wrong and raw emails are not stored by DBmail in DB. I haven't tested DBmail performance on larger mailboxes only because the only DBmail setup I have runs virtual host. Performance test on virtual host would be unfair considering that other servers are tested on real machine. DBMail might find its niche in some setups, but large mailboxes are not in that niche. 750 GB DB proves it. You can't do text search raw email sources. There is no point of storing them in DB. -- Tomas ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
Reindl Harald wrote: Am 10.12.2009 20:23, schrieb Tomas Kuliavas: Correct solution is not to store email data in DB. I think any sane DBA could say you that. Don't store binary data in DB. What stupid statement in context of DBmail Incidentally, if one were to do as he suggests, it would make replicated setups more complicated! Which, incidentally, the replication is why I chose dbmail in the first place. signature.asc Description: OpenPGP digital signature ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
Reindl Harald wrote: Am 10.12.2009 20:23, schrieb Tomas Kuliavas: Correct solution is not to store email data in DB. I think any sane DBA could say you that. Don't store binary data in DB. What stupid statement in context of DBmail Only because dbmail already does hte opposite. Frankly, I don't think it would be such a _terrible_ idea to design dbmail to keep the dbmail_mimeparts as files, rather than part of the database. But I certainly don't expect that change to come anytime soon, if ever. signature.asc Description: OpenPGP digital signature ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
On Thursday 10 December 2009, Tomas Kuliavas to...@users.sourceforge.net wrote: DBMail might find its niche in some setups, but large mailboxes are not in that niche. 750 GB DB proves it. You can't do text search raw email sources. There is no point of storing them in DB. DBMail does store the email in the database, and it works fine. Some things are slower than the alternatives, some things are faster (like backups, which happen, you know, a lot). It certainly has it's niche and it does a fine job of it. -- No animals were harmed in the recording of this episode. We tried but that damn monkey was just too fast. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
I'd like to point out a few things: * The added complexity of storing and synchronising files on disk with records in tables, especially in a load-balanced, high available situation, is much more work than any returns you'll ever get. * Whether the emails are stored in the database or the filesystem (which is really just a database) is not going to be that much difference. Databases can be a bit inefficient for space at times but this is usually to increase speed. * I'd like to see 3 or 4 mailservers performing imap searches over an NFS share to get to the mailbox files or messages. Then we can really compare speed of the database vs filesystem in a networked environment * I'd like to see a system administrator easily recover all the emails for a mailbox since the last time cleanup was performed. Hint: update dbmail_messages set deleted_flag=0,status=0 where mailbox_idnr in (select mailbox_idnr from dbmail_mailboxes where owner_idnr IN (SELECT user_idnr from dbmail_users where userid='mail...@userdomain.com')); * I'd like to see fine-grained point-in-time recovery for the filesystem-based (or hybrid - scary) systems. Yes it would take a while for any system depending on mail size. * I'd like to see impact-free daily backups for filesystem-based systems. With dbmail, just have a slave replica you can pause replication on to get a perfect snapshot, with no impact on the live database during the backup duration. * Remember that with any mail system that has a huge amount of data, things are going to take time. Databases have more records to search through (although indexes can help speed this up). mbox are basically a crude database storing all the emails in one file so large mailboxes can take a very long time to work with. Maildir is good until the inbox gets so many small files that just the directory listing takes a long time. If you're going to have a large mail system, be aware that things will take time, or use multiple systems and a system like perdition to split up the mailboxes, or have an archive system for users to place old emails they want to keep in. * As for mail delivery speed statistics, take them all with a grain of salt. Our experience is the bottlneck for inbound mail is the antivirus and antispam stage, and with the huge amount of spam hitting our servers (90+% of all connections) it is actually faster to detect and reject spam than have the mail deliver into the mailboxes. Finally: * Mail systems that don't require high-availability, failover or networked environments for load balancing would probably be better to just use mbox or maildir, for simplicity. For mission-critical systems there are more items to consider before deciding which option to take. Josh On Thu, 2009-12-10 at 12:03 -0800, tabris wrote: Reindl Harald wrote: Am 10.12.2009 20:23, schrieb Tomas Kuliavas: Correct solution is not to store email data in DB. I think any sane DBA could say you that. Don't store binary data in DB. What stupid statement in context of DBmail Only because dbmail already does hte opposite. Frankly, I don't think it would be such a _terrible_ idea to design dbmail to keep the dbmail_mimeparts as files, rather than part of the database. But I certainly don't expect that change to come anytime soon, if ever. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
Josh Marshall wrote: I'd like to point out a few things: * The added complexity of storing and synchronising files on disk with records in tables, especially in a load-balanced, high available situation, is much more work than any returns you'll ever get. * Whether the emails are stored in the database or the filesystem (which is really just a database) is not going to be that much difference. Databases can be a bit inefficient for space at times but this is usually to increase speed. * I'd like to see 3 or 4 mailservers performing imap searches over an NFS share to get to the mailbox files or messages. Then we can really compare speed of the database vs filesystem in a networked environment * I'd like to see a system administrator easily recover all the emails for a mailbox since the last time cleanup was performed. Hint: update dbmail_messages set deleted_flag=0,status=0 where mailbox_idnr in (select mailbox_idnr from dbmail_mailboxes where owner_idnr IN (SELECT user_idnr from dbmail_users where userid='mail...@userdomain.com')); * I'd like to see fine-grained point-in-time recovery for the filesystem-based (or hybrid - scary) systems. Yes it would take a while for any system depending on mail size. * I'd like to see impact-free daily backups for filesystem-based systems. With dbmail, just have a slave replica you can pause replication on to get a perfect snapshot, with no impact on the live database during the backup duration. * Remember that with any mail system that has a huge amount of data, things are going to take time. Databases have more records to search through (although indexes can help speed this up). mbox are basically a crude database storing all the emails in one file so large mailboxes can take a very long time to work with. Maildir is good until the inbox gets so many small files that just the directory listing takes a long time. If you're going to have a large mail system, be aware that things will take time, or use multiple systems and a system like perdition to split up the mailboxes, or have an archive system for users to place old emails they want to keep in. * As for mail delivery speed statistics, take them all with a grain of salt. Our experience is the bottlneck for inbound mail is the antivirus and antispam stage, and with the huge amount of spam hitting our servers (90+% of all connections) it is actually faster to detect and reject spam than have the mail deliver into the mailboxes. Finally: * Mail systems that don't require high-availability, failover or networked environments for load balancing would probably be better to just use mbox or maildir, for simplicity. For mission-critical systems there are more items to consider before deciding which option to take. For what it's worth, I setup dbmail for my employer, and the only reason I chose it was that it was able to handle replication. Admittedly I abuse the system a little to do a master-master replication, and that over the Atlantic Ocean. Current database size is 100G, 95G of which is the dbmail_messageblks table. So yes, I know that there are advantages to this, and that there are major upsides to a database. And I certainly wasn't suggesting NFS (nightmare). But it would be interesting if we could have a replication agent that could push mimeparts to disc. And fwiw, I didn't want to put 30,000 files into one folder, and was not recommending the use of maildir. I more expected something like what Squid does, with 256x256 folders with mimeparts inside, indexed from their md5 or sha256 hashes. At the same time, I'm not sure that that kind of replication is practical. Maybe we need to instead make a MySQL engine that puts blobs into files. But that's rather offtopic for the dbmail list. signature.asc Description: OpenPGP digital signature ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
On Donnerstag, 10. Dezember 2009 Tomas Kuliavas wrote: DBMail might find its niche in some setups, but large mailboxes are not in that niche. 750 GB DB proves it. You can't do text search raw email sources. There is no point of storing them in DB. And you believe doing a raw text search on a 750GB flat file mailserver would be fast? dbmail 2.3 is different in that it stores mimeparts separately. Maybe a full text search skips binary attachments there. Paul? What I'd like to know from Daniel: Do you have 750G of real data, or is that just your DB size. It seems your setup is not optimized at all, a lost connection shows your server can't keep up with the load. Maybe all mysql Parameters need tuning anyway. dbmail heavily depends on a good DBA to give good performance. Once you have more than 10GB and 100+ users you see the difference. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660 / 415 6531 .network.your.ideas. // // Wir haben zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://willhaben.at/iad/realestate/object?adId=15306857 signature.asc Description: This is a digitally signed message part. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
I do not want to add to this quite hot situation, but there are two things worth mentioning: * I'd like to see impact-free daily backups for filesystem-based systems. With dbmail, just have a slave replica you can pause replication on to get a perfect snapshot, with no impact on the live database during the backup duration. That is actually possible, not with the same features, but one could use the snapshot features from LVM to achieve that. Create and mount that snapshot on your backup box and well, do with it whatever you like. The other thing I think is worth mentioning is especially about MySQL: the Full Text Index ( FTI ) is quite bad for searches, if you reach a certain amount of data, also looking through all the records without any index is quite slow. I have no comparison of flat file storage compared with database stored messages, but for MySQL there is soon to be a new search / index technology available, which hopefully will also get implemented in dbmail (?), called sphinx search. I have used it lately (beta version) in a project and the speed compared to MySQL with FTI was quite remarkable. Kind regards, Daniel ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
On Freitag, 11. Dezember 2009 Daniel Urstöger wrote: the Full Text Index ( FTI ) is quite bad for searches dbmail doesn't use FTI. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660 / 415 6531 .network.your.ideas. // // Wir haben zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://willhaben.at/iad/realestate/object?adId=15306857 signature.asc Description: This is a digitally signed message part. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
Hi, I didn't think it was hot. I have seen this argument a few times before, comparing apples and oranges and suggesting that dbmail does it the wrong way and should change. Yes dbmail has a niche and it excels in situations where filesystem-based mail systems can't cut it, and that's why I'm using it. For those who think filesystem-based mail systems are better for them, I say go for it! Not everyone has the same requirements. To address your two points though: I have found that since linux kernel 2.6 series, LVM snapshots have caused system lockups. I used it happily in the 2.4 series. Besides that, I did mention *impact-free*. Adding a snapshot and reading from a snapshot severely impacts the speed of the running system. Yes you get a clean backup, but the hard disk is being placed under a huge read strain, not to mention the extra COW load for every write to the filesystem. I believe that dbmail on MySQL requires the use of InnoDB, which I believe (or has this recently changed?) does not support Full Text Index. Maybe using something like Sphinx as a bolt-on would be handy for doing IMAP searches. I generally sync all my emails to my desktop machine and do any searches on the local copies. Then searches don't impact the servers :) On Fri, 2009-12-11 at 01:00 +0100, Daniel Urstöger wrote: I do not want to add to this quite hot situation, but there are two things worth mentioning: * I'd like to see impact-free daily backups for filesystem-based systems. With dbmail, just have a slave replica you can pause replication on to get a perfect snapshot, with no impact on the live database during the backup duration. That is actually possible, not with the same features, but one could use the snapshot features from LVM to achieve that. Create and mount that snapshot on your backup box and well, do with it whatever you like. The other thing I think is worth mentioning is especially about MySQL: the Full Text Index ( FTI ) is quite bad for searches, if you reach a certain amount of data, also looking through all the records without any index is quite slow. I have no comparison of flat file storage compared with database stored messages, but for MySQL there is soon to be a new search / index technology available, which hopefully will also get implemented in dbmail (?), called sphinx search. I have used it lately (beta version) in a project and the speed compared to MySQL with FTI was quite remarkable. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
Never claimed it does ;) Just saying that for bigger datasets it becomes more and more useless but with Sphinx it remains blazing fast. So, for dbmail that would be a nice to have feature ... Am 11.12.2009 um 01:07 schrieb Michael Monnerie michael.monne...@is.it-management.at : On Freitag, 11. Dezember 2009 Daniel Urstöger wrote: the Full Text Index ( FTI ) is quite bad for searches dbmail doesn't use FTI. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660 / 415 6531 .network.your.ideas. // // Wir haben zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://willhaben.at/iad/realestate/object?adId=15306857 ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
Oranges and Apples, I agree to that. I happily use dbmail as well as my qmail/vpopmail setup. Every system has it quirks and shortcommings. To address your two points though: I have found that since linux kernel 2.6 series, LVM snapshots have caused system lockups. I used it happily in the 2.4 series. Besides that, I did mention *impact-free*. Adding a snapshot and reading from a snapshot severely impacts the speed of the running system. Yes you get a clean backup, but the hard disk is being placed under a huge read strain, not to mention the extra COW load for every write to the filesystem. Yes, but after backing to the snapshot to some place one can remove it and the speed will be back to normal. So, running a db slave and using mysqldump for backups is not much different. I believe that dbmail on MySQL requires the use of InnoDB, which I believe (or has this recently changed?) does not support Full Text Index. Maybe using something like Sphinx as a bolt-on would be handy for doing IMAP searches. I generally sync all my emails to my desktop machine and do any searches on the local copies. Then searches don't impact the servers :) On Fri, 2009-12-11 at 01:00 +0100, Daniel Urstöger wrote: I do not want to add to this quite hot situation, but there are two things worth mentioning: * I'd like to see impact-free daily backups for filesystem-based systems. With dbmail, just have a slave replica you can pause replication on to get a perfect snapshot, with no impact on the live database during the backup duration. That is actually possible, not with the same features, but one could use the snapshot features from LVM to achieve that. Create and mount that snapshot on your backup box and well, do with it whatever you like. The other thing I think is worth mentioning is especially about MySQL: the Full Text Index ( FTI ) is quite bad for searches, if you reach a certain amount of data, also looking through all the records without any index is quite slow. I have no comparison of flat file storage compared with database stored messages, but for MySQL there is soon to be a new search / index technology available, which hopefully will also get implemented in dbmail (?), called sphinx search. I have used it lately (beta version) in a project and the speed compared to MySQL with FTI was quite remarkable. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
On Fri, 2009-12-11 at 01:38 +0100, Daniel Urstöger wrote: Yes, but after backing to the snapshot to some place one can remove it and the speed will be back to normal. So, running a db slave and using mysqldump for backups is not much different. Not quite. Having a separate slave database server to do the heavy work of backups has no impact on the master database during the backup period. Therefore the master database is always at normal speed. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
Am 11.12.2009 01:38, schrieb Daniel Urstöger dan...@gosi.at: Yes, but after backing to the snapshot to some place one can remove it and the speed will be back to normal. So, running a db slave and using mysqldump for backups is not much different. a) the slave can yun on one or more other physical machines b) shut down the slave, run rsync, start the slave There is no moment you have more load on the master signature.asc Description: OpenPGP digital signature ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
My message was not quite finished, sadly there isnt an App for that ;) I have found that since linux kernel 2.6 series, LVM snapshots have caused system lockups. I used it happily in the 2.4 series. Besides that, I did mention *impact-free*. Adding a snapshot and reading from a snapshot severely impacts the speed of the running system. Yes you get a clean backup, but the hard disk is being placed under a huge read strain, not to mention the extra COW load for every write to the filesystem. Which distro are you using? I havent had any snapshot related problems yet, but maybe that is related to how much data you have? The snapshots I create rately have more then 20GB of data. And I don't want to point out a better distro for you but I see this mailing list as quite a nice source of knowledge and of exchange ... So I just want to know to learn from :) I believe that dbmail on MySQL requires the use of InnoDB, which I believe (or has this recently changed?) does not support Full Text Index. Maybe using something like Sphinx as a bolt-on would be handy for doing IMAP searches. I generally sync all my emails to my desktop machine and do any searches on the local copies. Then searches don't impact the servers :) It does require transactions, so for MySQL you are quite tied to innodb, unless you want to try something more experimental. It would be nice of customers to act like you, but usually they don't. Though I really liked that Thunderbird 3 now even heavily suggest syncing to local disk. Sphinx for searches would be awesome! The new release is even compatible to the MySQL client / libraries. So no API to fiddle with, but still needs quite some work, I guess ... ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
Not quite. Having a separate slave database server to do the heavy work of backups has no impact on the master database during the backup period. Therefore the master database is always at normal speed. Well, one can also do that with a filesystem based storage, you just need something similar to the MySQL replication for flat files. DRDB for example. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
On Fri, 2009-12-11 at 02:01 +0100, Daniel Urstöger wrote: Not quite. Having a separate slave database server to do the heavy work of backups has no impact on the master database during the backup period. Therefore the master database is always at normal speed. Well, one can also do that with a filesystem based storage, you just need something similar to the MySQL replication for flat files. DRDB for example. I have also had DRBD in production, sharing in a similar way. However when the DRBD reconnects it needs to scan through all the changes in the master disk to find and copy across all the changed sectors. So there is a performance hit when the copy completes. It depends on the number of changed sectors to how much of a hit this will be. In most cases it would be minor, so I am being picky here. One problem I did have with DRBD (used 0.7 series) is that I would have the system lock me out if only one side came up, so I was completely without service until the timeout or I interfered with it to manually switch to the appropriate master. They may have fixed this with the 0.8 series. You could argue that with the mysql binary log there is a performance hit when the copy completes and the slave reconnects, but if the binary logs are on a separate disk spindle, the don't affect the performance of the main database files. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
2009.12.11 01:33 Michael Monnerie rašė: On Donnerstag, 10. Dezember 2009 Tomas Kuliavas wrote: DBMail might find its niche in some setups, but large mailboxes are not in that niche. 750 GB DB proves it. You can't do text search raw email sources. There is no point of storing them in DB. And you believe doing a raw text search on a 750GB flat file mailserver would be fast? Emails are not raw text. There are at least two ways to write test in email and if you go to 8bit text, number of same text variations multiplies. SQL can't search emails stored in DB, because SQL does not know about encodings, mime formats and character sets -- Tomas ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
[Dbmail] Optimizing Dbmail Database
I was thinking to run the OPTIMIZE TABLE command in the dbmail database, bcoz right now the database size of my organization dbmail database is currently around 750 GB. Another reason was that my dbmail-util -ay command (which runs on 2am and 6am everyday) keeps giving me this error: [code] Dec 07 03:00:37 mailadmin.mpob.g lt-dbmail-util[29046]: Error:[sql] dbmysql.c,db_query(+290): [Lost connection to MySQL server during query] [SELECT MIN(messageblk_idnr),MAX(is_header) FROM dbmail_messageblks GROUP BY physmessage_id HAVING MAX(is_header)=0] Dec 07 03:00:37 mailadmin.mpob.g lt-dbmail-util[29046]: Error:[db] db.c,db_icheck_isheader(+1788): could not access messageblks table Failed. An error occured. Please check log. Maintenance done. Errors were found but not fixed due to failures. Please check the logs for further details, turning up the trace level as needed. [/code] I am running dbmail 2.2.10, mysql 5.0.45. My question is: is this a good idea? (to run the OPTIMIZE TABLE command). If so, why. Thanx a lot guys. -- View this message in context: http://old.nabble.com/Optimizing-Dbmail-Database-tp26672088p26672088.html Sent from the dbmail users mailing list archive at Nabble.com. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
On Sun, 2009-12-06 at 20:13 -0800, Daniel Mejia wrote: I was thinking to run the OPTIMIZE TABLE command in the dbmail database, bcoz right now the database size of my organization dbmail database is currently around 750 GB. Is there much free space in that table? If not, optimize table wont free any space and probably wont improve access speed etc. Remember that you need as much free space to optimise as the size of the table. I have a mysql server that has 20Gb free in the innodb but only 5Gb free on the disk. Since the messageblk table is 80Gb I can't reclaim that space, but it doesn't make a difference as the innodb engine reuses the space fairly well. Note also that to optimise a table means to read it and write it out with the table locked. A 750Gb database will take a lng time and you wont be able to write to it for that time. Josh. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: [Dbmail] Optimizing Dbmail Database
Is there much free space in that table? If not, optimize table wont free any space and probably wont improve access speed etc. with the SHOW TABLE STATUS command, i can see that the free space is not adequate. i have around 2 TB of free space on the hard disk, but im not so sure about the table free space. Note also that to optimise a table means to read it and write it out with the table locked. A 750Gb database will take a lng time and you wont be able to write to it for that time. we are willing to shut down the email server for that maintenance if it gives the intended result, which is to free up as much space as possible and at least improve a tiny bit of access speed if possible. if OPTIMIZE TABLE is not the ideal solution, what would u recommend me to do? the error that pops up in the dbmail-util -ay command really bugs us. -- View this message in context: http://old.nabble.com/Optimizing-Dbmail-Database-tp26672088p26672745.html Sent from the dbmail users mailing list archive at Nabble.com. ___ DBmail mailing list DBmail@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail