[PERFORM] Benchmark shows very slow bulk delete
Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html Is this normal? Thom
Re: [PERFORM] Benchmark shows very slow bulk delete
On 01/27/10 14:28, Thom Brown wrote: Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html I wish that, when people got the idea to run a simplistic benchmark like this, they would at least have the common sense to put the database on a RAM drive to avoid problems with different cylinder speeds of rotational media and fragmentation from multiple runs. Here are some typical results from a desktop SATA drive: ada0 512 # sectorsize 500107862016# mediasize in bytes (466G) 976773168 # mediasize in sectors 969021 # Cylinders according to firmware. 16 # Heads according to firmware. 63 # Sectors according to firmware. 6QG3Z026# Disk ident. Seek times: Full stroke: 250 iter in 5.676993 sec = 22.708 msec Half stroke: 250 iter in 4.284583 sec = 17.138 msec Quarter stroke: 500 iter in 6.805539 sec = 13.611 msec Short forward:400 iter in 2.678447 sec =6.696 msec Short backward: 400 iter in 2.318637 sec =5.797 msec Seq outer: 2048 iter in 0.214292 sec =0.105 msec Seq inner: 2048 iter in 0.203929 sec =0.100 msec Transfer rates: outside: 102400 kbytes in 1.229694 sec =83273 kbytes/sec middle:102400 kbytes in 1.446570 sec =70788 kbytes/sec inside:102400 kbytes in 2.446670 sec =41853 kbytes/sec This doesn't explain the 4-orders-of-magnitude difference between MySQL and PostgreSQL in bulk_delete() (0.02 vs 577) but it does suggest that some other results where the performance is close, might be bogus. It's tough to benchmark anything involving rotational drives :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark shows very slow bulk delete
On Wed, 27 Jan 2010, Thom Brown wrote: Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html Is this normal? On the contrary, TRUNCATE TABLE is really rather fast. Seriously, the Postgres developers, when designing the system, decided on a database layout that was optimised for the most common cases. Bulk deletion of data is not really that common an operation, unless you are deleting whole categories of data, where setting up partitioning and deleting whole partitions would be sensible. Other complications are that the server has to maintain concurrent integrity - that is, another transaction must be able to see either none of the changes or all of them. As a consequence of this, Postgres needs to do a sequential scan through the table and mark the rows for deletion in the transaction, before flipping the transaction committed status and cleaning up afterwards. I'd be interested in how mysql manages to delete a whole load of rows in 0.02 seconds. How many rows is that? (Reading in the comments, I saw this: The slow times for Postgresql Bulk Modify/Bulk Delete can be explained by foreign key references to the updates table. I'm not sure that fully explains it though, unless there are basically zero rows being deleted - it's hardly bulk then, is it?) Matthew -- People who love sausages, respect the law, and work with IT standards shouldn't watch any of them being made. -- Peter Gutmann -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark shows very slow bulk delete
Thom Brown thombr...@gmail.com wrote: Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html Is this normal? It is if you don't have an index on the table which has a foreign key defined which references the table in which you're doing deletes. The author of the benchmark apparently didn't realize that MySQL automatically adds such an index to the dependent table, while PostgreSQL leaves it to you to decide whether to add such an index. For insert-only tables, it isn't always worth the cost of maintaining it. Also, I see that the database was small enough to be fully cached, yet the costs weren't adjusted to the recommended values for such an environment, so PostgreSQL should *really* have beaten MySQL by more than it did. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] test send (recommended by Dave Page)
Hi all - sorry to create additional email 'noise' But I've been trying to post a rather long query to The pgsql-performance user list. Dave thought That it might have been bounced due to the length And suggested I send a short 'blast' If this works I'll send a shortened version of my query later. Thank you, Mark Steben | Database Administrator http://www.autorevenue.com @utoRevenueR - Keeping Customers Close 95D Ashley Ave, West Springfield, MA 01089 413.243.4800 x1512 (Phone) |413.732-1824 (Fax) http://www.dominionenterprises.com @utoRevenue is a registered trademark and a division of Dominion Enterprises
Re: [PERFORM] test send (recommended by Dave Page)
On Wed, 27 Jan 2010, Mark Steben wrote: Subject: [PERFORM] test send (recommended by Dave Page) Hi all - sorry to create additional email 'noise' But I've been trying to post a rather long query to The pgsql-performance user list. Dave thought That it might have been bounced due to the length And suggested I send a short 'blast' If this works I'll send a shortened version of my query later. Whatever you do, don't try to send an email to the list with the word help in the subject. The mailing list software will silently throw away your email. Helpful, for a help mailing list. Matthew -- The early bird gets the worm, but the second mouse gets the cheese. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark shows very slow bulk delete
On Wed, Jan 27, 2010 at 9:54 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: It is if you don't have an index on the table which has a foreign key defined which references the table in which you're doing deletes. The author of the benchmark apparently didn't realize that MySQL automatically adds such an index to the dependent table, while PostgreSQL leaves it to you to decide whether to add such an index. For insert-only tables, it isn't always worth the cost of maintaining it. It really gets to me that I have to not use some foreign keys in MySQL because I can't afford to maintain the index. I have to write super fun check constraints that look like DELIMITER \\ CREATE TRIGGER Location_Pre_Delete BEFORE DELETE ON Locations FOR EACH ROW BEGIN DECLARE _id INT; SELECT id INTO _id FROM BigHistoryTable WHERE locationId = OLD.id LIMIT 1; IF _id IS NOT NULL THEN INSERT INTO BigHistoryTable (column_that_does_not_exist_but_says_that_you_violated_my_hacked_foreign_key) VALUES ('fail'); END IF; END\\ Sometimes I can't sleep at night for having written that code.
Re: [PERFORM] test send (recommended by Dave Page)
On Wed, Jan 27, 2010 at 3:33 PM, Mark Steben mste...@autorevenue.com wrote: Hi all – sorry to create additional email ‘noise’ But I’ve been trying to post a rather long query to The pgsql-performance user list. Dave thought That it might have been bounced due to the length And suggested I send a short ‘blast’ If this works I’ll send a shortened version of my query later. I got it. Try posting large query plans etc. to pastebin or a similar service to keep the mail size down. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark shows very slow bulk delete
On Wednesday 27 January 2010 15:49:06 Matthew Wakeling wrote: On Wed, 27 Jan 2010, Thom Brown wrote: Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html Is this normal? On the contrary, TRUNCATE TABLE is really rather fast. Seriously, the Postgres developers, when designing the system, decided on a database layout that was optimised for the most common cases. Bulk deletion of data is not really that common an operation, unless you are deleting whole categories of data, where setting up partitioning and deleting whole partitions would be sensible. Other complications are that the server has to maintain concurrent integrity - that is, another transaction must be able to see either none of the changes or all of them. As a consequence of this, Postgres needs to do a sequential scan through the table and mark the rows for deletion in the transaction, before flipping the transaction committed status and cleaning up afterwards. I'd be interested in how mysql manages to delete a whole load of rows in 0.02 seconds. How many rows is that? Afair mysql detects that case and converts it into some truncate equivalent. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?
2010/1/26 Matthew Wakeling matt...@flymine.org On Tue, 26 Jan 2010, Richard Neill wrote: SELECT SUM (case when id 120 and id 121 then 1 else 0 end) from tbl_tracker; Explain shows that this does a sequential scan. I'd defer to Tom on this one, but really, for Postgres to work this out, it would have to peer deep into the mysterious SUM function, and realise that the number zero is a noop. I suppose it would be possible, but you'd have to define noops for each of the different possible functions, *and* make the planner clever enough to spot the noop-matching number in the else and convert the WHEN into a WHERE. Hello. How about SELECT SUM (case when id 120 and id 121 then 1 end) from tbl_tracker; It gives same result (may be unless there are no records at all) and optimizer already knows it need not to call function for null input. Such an optimization would cover much more cases. It would look like: * Check only for aggregate subselects * All the functions should be noop for null input * Add ORed constraint for every function input is not null (in this example (case when id A1 and id B1 then 1 end is not null) or (case when id A2 and id B2 then 1 end is not null) or ... or (case when id An and id Bn then 1 end is not null) * Know special case (case when id A1 and id B1 then 1 end is not null) = (id A1 and id B1) by ORing all the when conditions case when C1 then D1 when C2 then D2 ... when Cm then Dm end is not null = C1 or C2 or ... or Cm. Event without last part it may give bonuses even for select count(field) from table transformed into select count(field) from table where field is not null and using [partial] indexes. As of last *, replacing COUNT with SUM(CASE()) is used often enough when multiple count calculations are needed. Best regards, Vitalii Tymchyshyn
Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?
On Wed, 27 Jan 2010, Віталій Тимчишин wrote: How about SELECT SUM (case when id 120 and id 121 then 1 end) from tbl_tracker; That is very interesting. * All the functions should be noop for null input Alas, not true for COUNT(*), AVG(), etc. Matthew -- An optimist sees the glass as half full, a pessimist as half empty, and an engineer as having redundant storage capacity. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?
27 січня 2010 р. 19:01 Matthew Wakeling matt...@flymine.org написав: On Wed, 27 Jan 2010, Віталій Тимчишин wrote: How about SELECT SUM (case when id 120 and id 121 then 1 end) from tbl_tracker; That is very interesting. * All the functions should be noop for null input Alas, not true for COUNT(*), AVG(), etc. select avg(b), count(b), count(*) from (values (2),(null))a(b) gives (2.0, 1, 2) for me, so AVG is in game. Sure, it won't work for count(*), but optimizer already knows which aggregates are strict and which are not, so no new information is needed. Best regards, Vitalii Tymchyshyn
Re: [PERFORM] Benchmark shows very slow bulk delete
Ivan Voras wrote: I wish that, when people got the idea to run a simplistic benchmark like this, they would at least have the common sense to put the database on a RAM drive to avoid problems with different cylinder speeds of rotational media and fragmentation from multiple runs. Huh? It's tough to benchmark anything involving rotational drives :) But - how the database organises its IO to maximise the available bandwidth, limit avaiodable seeks, and limit avoidable flushes is absolutely key to realistic performance, especially on modest everyday hardware. Not everyone has a usage that justifies 'enterprise' kit - but plenty of people can benefit from something a step up from SQLite. If you just want to benchmark query processor efficiency then that's one scenario where taking physical IO out of the picture might be justified, but I don't see a good reason to suggest that it is 'common sense' to do so for all testing, and while the hardware involved is pretty low end, its still a valid data point. . -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark shows very slow bulk delete
Kevin Grittner wrote: It is if you don't have an index on the table which has a foreign key defined which references the table in which you're doing deletes. The author of the benchmark apparently didn't realize that MySQL automatically adds such an index to the dependent table, while PostgreSQL leaves it to you to decide whether to add such an index. The author there didn't write the PostgreSQL schema; he's just using the osdb test kit: http://osdb.sourceforge.net/ Given that both Peter and Neil Conway have thrown work their way, I know there's been some PG specific work done on that project by people who know what's going on, but I'm not sure if that included a performance check. A quick glance at http://osdb.cvs.sourceforge.net/viewvc/osdb/osdb/src/callable-sql/postgres-ui/osdb-pg-ui.m4?revision=1.4view=markup finds this: 222 createIndexForeign(char* tName, char* keyName, char* keyCol, 223 char* fTable, char* fFields) { 224 snprintf(cmd, CMDBUFLEN, 225 alter table %s add constraint %s foreign key (%s) references %s (%s), 226 tName, keyName, keyCol, fTable, fFields); But I don't see any obvious spot where the matching index that should go along with that is created at. The code is just convoluted enough (due to how they abstract away support for multiple databases) that I'm not sure yet--maybe they call their createIndexBtree function and fix this in a later step. But the way the function is named--createIndexForeign--seems to suggest they believe that this operation will create the index, too, which as you point out is just not true. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance