Re: [PERFORM] Benchmark shows very slow bulk delete
James Mansion wrote: Ivan Voras wrote: I wish that, when people got the idea to run a simplistic benchmark like this, they would at least have the common sense to put the database on a RAM drive to avoid problems with different cylinder speeds of rotational media and fragmentation from multiple runs. Huh? It's tough to benchmark anything involving rotational drives :) But - how the database organises its IO to maximise the available bandwidth, limit avaiodable seeks, and limit avoidable flushes is absolutely key to realistic performance, especially on modest everyday hardware. Not everyone has a usage that justifies 'enterprise' kit - but plenty of people can benefit from something a step up from SQLite. If you just want to benchmark query processor efficiency then that's one scenario where taking physical IO out of the picture might be justified, but I don't see a good reason to suggest that it is 'common sense' to do so for all testing, and while the hardware involved is pretty low end, its still a valid data point. . You are right, of course, for common benchmarking to see what performance can be expected from some setup in some circumstances, but not where the intention is to compare products. You don't have to go the memory drive / SSD route - just make sure the databases always use the same (small) area of the disk drive. signature.asc Description: OpenPGP digital signature
[PERFORM] Benchmark shows very slow bulk delete
Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html Is this normal? Thom
Re: [PERFORM] Benchmark shows very slow bulk delete
On 01/27/10 14:28, Thom Brown wrote: Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html I wish that, when people got the idea to run a simplistic benchmark like this, they would at least have the common sense to put the database on a RAM drive to avoid problems with different cylinder speeds of rotational media and fragmentation from multiple runs. Here are some typical results from a desktop SATA drive: ada0 512 # sectorsize 500107862016# mediasize in bytes (466G) 976773168 # mediasize in sectors 969021 # Cylinders according to firmware. 16 # Heads according to firmware. 63 # Sectors according to firmware. 6QG3Z026# Disk ident. Seek times: Full stroke: 250 iter in 5.676993 sec = 22.708 msec Half stroke: 250 iter in 4.284583 sec = 17.138 msec Quarter stroke: 500 iter in 6.805539 sec = 13.611 msec Short forward:400 iter in 2.678447 sec =6.696 msec Short backward: 400 iter in 2.318637 sec =5.797 msec Seq outer: 2048 iter in 0.214292 sec =0.105 msec Seq inner: 2048 iter in 0.203929 sec =0.100 msec Transfer rates: outside: 102400 kbytes in 1.229694 sec =83273 kbytes/sec middle:102400 kbytes in 1.446570 sec =70788 kbytes/sec inside:102400 kbytes in 2.446670 sec =41853 kbytes/sec This doesn't explain the 4-orders-of-magnitude difference between MySQL and PostgreSQL in bulk_delete() (0.02 vs 577) but it does suggest that some other results where the performance is close, might be bogus. It's tough to benchmark anything involving rotational drives :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark shows very slow bulk delete
On Wed, 27 Jan 2010, Thom Brown wrote: Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html Is this normal? On the contrary, TRUNCATE TABLE is really rather fast. Seriously, the Postgres developers, when designing the system, decided on a database layout that was optimised for the most common cases. Bulk deletion of data is not really that common an operation, unless you are deleting whole categories of data, where setting up partitioning and deleting whole partitions would be sensible. Other complications are that the server has to maintain concurrent integrity - that is, another transaction must be able to see either none of the changes or all of them. As a consequence of this, Postgres needs to do a sequential scan through the table and mark the rows for deletion in the transaction, before flipping the transaction committed status and cleaning up afterwards. I'd be interested in how mysql manages to delete a whole load of rows in 0.02 seconds. How many rows is that? (Reading in the comments, I saw this: The slow times for Postgresql Bulk Modify/Bulk Delete can be explained by foreign key references to the updates table. I'm not sure that fully explains it though, unless there are basically zero rows being deleted - it's hardly bulk then, is it?) Matthew -- People who love sausages, respect the law, and work with IT standards shouldn't watch any of them being made. -- Peter Gutmann -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark shows very slow bulk delete
Thom Brown thombr...@gmail.com wrote: Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html Is this normal? It is if you don't have an index on the table which has a foreign key defined which references the table in which you're doing deletes. The author of the benchmark apparently didn't realize that MySQL automatically adds such an index to the dependent table, while PostgreSQL leaves it to you to decide whether to add such an index. For insert-only tables, it isn't always worth the cost of maintaining it. Also, I see that the database was small enough to be fully cached, yet the costs weren't adjusted to the recommended values for such an environment, so PostgreSQL should *really* have beaten MySQL by more than it did. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark shows very slow bulk delete
On Wed, Jan 27, 2010 at 9:54 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: It is if you don't have an index on the table which has a foreign key defined which references the table in which you're doing deletes. The author of the benchmark apparently didn't realize that MySQL automatically adds such an index to the dependent table, while PostgreSQL leaves it to you to decide whether to add such an index. For insert-only tables, it isn't always worth the cost of maintaining it. It really gets to me that I have to not use some foreign keys in MySQL because I can't afford to maintain the index. I have to write super fun check constraints that look like DELIMITER \\ CREATE TRIGGER Location_Pre_Delete BEFORE DELETE ON Locations FOR EACH ROW BEGIN DECLARE _id INT; SELECT id INTO _id FROM BigHistoryTable WHERE locationId = OLD.id LIMIT 1; IF _id IS NOT NULL THEN INSERT INTO BigHistoryTable (column_that_does_not_exist_but_says_that_you_violated_my_hacked_foreign_key) VALUES ('fail'); END IF; END\\ Sometimes I can't sleep at night for having written that code.
Re: [PERFORM] Benchmark shows very slow bulk delete
On Wednesday 27 January 2010 15:49:06 Matthew Wakeling wrote: On Wed, 27 Jan 2010, Thom Brown wrote: Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html Is this normal? On the contrary, TRUNCATE TABLE is really rather fast. Seriously, the Postgres developers, when designing the system, decided on a database layout that was optimised for the most common cases. Bulk deletion of data is not really that common an operation, unless you are deleting whole categories of data, where setting up partitioning and deleting whole partitions would be sensible. Other complications are that the server has to maintain concurrent integrity - that is, another transaction must be able to see either none of the changes or all of them. As a consequence of this, Postgres needs to do a sequential scan through the table and mark the rows for deletion in the transaction, before flipping the transaction committed status and cleaning up afterwards. I'd be interested in how mysql manages to delete a whole load of rows in 0.02 seconds. How many rows is that? Afair mysql detects that case and converts it into some truncate equivalent. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark shows very slow bulk delete
Ivan Voras wrote: I wish that, when people got the idea to run a simplistic benchmark like this, they would at least have the common sense to put the database on a RAM drive to avoid problems with different cylinder speeds of rotational media and fragmentation from multiple runs. Huh? It's tough to benchmark anything involving rotational drives :) But - how the database organises its IO to maximise the available bandwidth, limit avaiodable seeks, and limit avoidable flushes is absolutely key to realistic performance, especially on modest everyday hardware. Not everyone has a usage that justifies 'enterprise' kit - but plenty of people can benefit from something a step up from SQLite. If you just want to benchmark query processor efficiency then that's one scenario where taking physical IO out of the picture might be justified, but I don't see a good reason to suggest that it is 'common sense' to do so for all testing, and while the hardware involved is pretty low end, its still a valid data point. . -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark shows very slow bulk delete
Kevin Grittner wrote: It is if you don't have an index on the table which has a foreign key defined which references the table in which you're doing deletes. The author of the benchmark apparently didn't realize that MySQL automatically adds such an index to the dependent table, while PostgreSQL leaves it to you to decide whether to add such an index. The author there didn't write the PostgreSQL schema; he's just using the osdb test kit: http://osdb.sourceforge.net/ Given that both Peter and Neil Conway have thrown work their way, I know there's been some PG specific work done on that project by people who know what's going on, but I'm not sure if that included a performance check. A quick glance at http://osdb.cvs.sourceforge.net/viewvc/osdb/osdb/src/callable-sql/postgres-ui/osdb-pg-ui.m4?revision=1.4view=markup finds this: 222 createIndexForeign(char* tName, char* keyName, char* keyCol, 223 char* fTable, char* fFields) { 224 snprintf(cmd, CMDBUFLEN, 225 alter table %s add constraint %s foreign key (%s) references %s (%s), 226 tName, keyName, keyCol, fTable, fFields); But I don't see any obvious spot where the matching index that should go along with that is created at. The code is just convoluted enough (due to how they abstract away support for multiple databases) that I'm not sure yet--maybe they call their createIndexBtree function and fix this in a later step. But the way the function is named--createIndexForeign--seems to suggest they believe that this operation will create the index, too, which as you point out is just not true. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance