[PERFORM] Benchmark shows very slow bulk delete

2010-01-27 Thread Thom Brown
Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL,
and while PostgreSQL is generally faster, I noticed the bulk delete was very
slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html

Is this normal?

Thom


Re: [PERFORM] Benchmark shows very slow bulk delete

2010-01-27 Thread Ivan Voras

On 01/27/10 14:28, Thom Brown wrote:

Had a quick look at a benchmark someone put together of MySQL vs
PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk
delete was very slow:
http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html


I wish that, when people got the idea to run a simplistic benchmark like 
this, they would at least have the common sense to put the database on a 
RAM drive to avoid problems with different cylinder speeds of rotational 
media and fragmentation from multiple runs.


Here are some typical results from a desktop SATA drive:

ada0
512 # sectorsize
500107862016# mediasize in bytes (466G)
976773168   # mediasize in sectors
969021  # Cylinders according to firmware.
16  # Heads according to firmware.
63  # Sectors according to firmware.
6QG3Z026# Disk ident.

Seek times:
Full stroke:  250 iter in   5.676993 sec =   22.708 msec
Half stroke:  250 iter in   4.284583 sec =   17.138 msec
Quarter stroke:   500 iter in   6.805539 sec =   13.611 msec
Short forward:400 iter in   2.678447 sec =6.696 msec
Short backward:   400 iter in   2.318637 sec =5.797 msec
Seq outer:   2048 iter in   0.214292 sec =0.105 msec
Seq inner:   2048 iter in   0.203929 sec =0.100 msec
Transfer rates:
outside:   102400 kbytes in   1.229694 sec =83273 kbytes/sec
middle:102400 kbytes in   1.446570 sec =70788 kbytes/sec
inside:102400 kbytes in   2.446670 sec =41853 kbytes/sec

This doesn't explain the 4-orders-of-magnitude difference between MySQL 
and PostgreSQL in bulk_delete() (0.02 vs 577) but it does suggest that 
some other results where the performance is close, might be bogus.


It's tough to benchmark anything involving rotational drives :)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark shows very slow bulk delete

2010-01-27 Thread Matthew Wakeling

On Wed, 27 Jan 2010, Thom Brown wrote:

Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL,
and while PostgreSQL is generally faster, I noticed the bulk delete was very
slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html

Is this normal?


On the contrary, TRUNCATE TABLE is really rather fast.

Seriously, the Postgres developers, when designing the system, decided on 
a database layout that was optimised for the most common cases. Bulk 
deletion of data is not really that common an operation, unless you are 
deleting whole categories of data, where setting up partitioning and 
deleting whole partitions would be sensible.


Other complications are that the server has to maintain concurrent 
integrity - that is, another transaction must be able to see either none 
of the changes or all of them. As a consequence of this, Postgres needs to 
do a sequential scan through the table and mark the rows for deletion in 
the transaction, before flipping the transaction committed status and 
cleaning up afterwards.


I'd be interested in how mysql manages to delete a whole load of rows in 
0.02 seconds. How many rows is that?


(Reading in the comments, I saw this: The slow times for Postgresql Bulk 
Modify/Bulk Delete can be explained by foreign key references to the 
updates table. I'm not sure that fully explains it though, unless there 
are basically zero rows being deleted - it's hardly bulk then, is it?)


Matthew

--
People who love sausages, respect the law, and work with IT standards 
shouldn't watch any of them being made.  -- Peter Gutmann


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark shows very slow bulk delete

2010-01-27 Thread Kevin Grittner
Thom Brown thombr...@gmail.com wrote:
 
 Had a quick look at a benchmark someone put together of MySQL vs
 PostgreSQL, and while PostgreSQL is generally faster, I noticed
 the bulk delete was very slow:
 http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html
 
 Is this normal?
 
It is if you don't have an index on the table which has a foreign
key defined which references the table in which you're doing
deletes.  The author of the benchmark apparently didn't realize that
MySQL automatically adds such an index to the dependent table, while
PostgreSQL leaves it to you to decide whether to add such an index. 
For insert-only tables, it isn't always worth the cost of
maintaining it.
 
Also, I see that the database was small enough to be fully cached,
yet the costs weren't adjusted to the recommended values for such an
environment, so PostgreSQL should *really* have beaten MySQL by more
than it did.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] test send (recommended by Dave Page)

2010-01-27 Thread Mark Steben
Hi all - sorry to create additional email 'noise'

But I've been trying to post a rather long query to

The pgsql-performance user list.  Dave thought

That it might have been bounced due to the length

And suggested I send a short 'blast'

 

If this works I'll send a shortened version of my query later.

 

Thank you,

 

Mark Steben | Database Administrator 
 http://www.autorevenue.com @utoRevenueR - Keeping Customers Close 
95D Ashley Ave, West Springfield, MA 01089 
413.243.4800 x1512 (Phone) |413.732-1824 (Fax) 
 http://www.dominionenterprises.com @utoRevenue is a registered trademark
and a division of Dominion Enterprises 



 



Re: [PERFORM] test send (recommended by Dave Page)

2010-01-27 Thread Matthew Wakeling

On Wed, 27 Jan 2010, Mark Steben wrote:

Subject: [PERFORM] test send (recommended by Dave Page)

Hi all - sorry to create additional email 'noise'

But I've been trying to post a rather long query to

The pgsql-performance user list.  Dave thought

That it might have been bounced due to the length

And suggested I send a short 'blast'



If this works I'll send a shortened version of my query later.


Whatever you do, don't try to send an email to the list with the word 
help in the subject. The mailing list software will silently throw away 
your email. Helpful, for a help mailing list.


Matthew

--
The early bird gets the worm, but the second mouse gets the cheese.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark shows very slow bulk delete

2010-01-27 Thread Nikolas Everett
On Wed, Jan 27, 2010 at 9:54 AM, Kevin Grittner kevin.gritt...@wicourts.gov
 wrote:

 It is if you don't have an index on the table which has a foreign
 key defined which references the table in which you're doing
 deletes.  The author of the benchmark apparently didn't realize that
 MySQL automatically adds such an index to the dependent table, while
 PostgreSQL leaves it to you to decide whether to add such an index.
 For insert-only tables, it isn't always worth the cost of
 maintaining it.


It really gets to me that I have to not use some foreign keys in MySQL
because I can't afford to maintain the index.  I have to write super fun
check constraints that look like

DELIMITER \\
CREATE TRIGGER Location_Pre_Delete BEFORE DELETE ON Locations FOR EACH ROW
BEGIN
  DECLARE _id INT;
  SELECT id INTO _id FROM BigHistoryTable WHERE locationId = OLD.id LIMIT 1;
  IF _id IS NOT NULL THEN
INSERT INTO BigHistoryTable
(column_that_does_not_exist_but_says_that_you_violated_my_hacked_foreign_key)
VALUES ('fail');
  END IF;
END\\

Sometimes I can't sleep at night for having written that code.


Re: [PERFORM] test send (recommended by Dave Page)

2010-01-27 Thread Dave Page
On Wed, Jan 27, 2010 at 3:33 PM, Mark Steben mste...@autorevenue.com wrote:
 Hi all – sorry to create additional email ‘noise’

 But I’ve been trying to post a rather long query to

 The pgsql-performance user list.  Dave thought

 That it might have been bounced due to the length

 And suggested I send a short ‘blast’



 If this works I’ll send a shortened version of my query later.

I got it. Try posting large query plans etc. to pastebin or a similar
service to keep the mail size down.

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark shows very slow bulk delete

2010-01-27 Thread Andres Freund
On Wednesday 27 January 2010 15:49:06 Matthew Wakeling wrote:
 On Wed, 27 Jan 2010, Thom Brown wrote:
  Had a quick look at a benchmark someone put together of MySQL vs
  PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk
  delete was very slow:
  http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html
  
  Is this normal?
 
 On the contrary, TRUNCATE TABLE is really rather fast.
 
 Seriously, the Postgres developers, when designing the system, decided on
 a database layout that was optimised for the most common cases. Bulk
 deletion of data is not really that common an operation, unless you are
 deleting whole categories of data, where setting up partitioning and
 deleting whole partitions would be sensible.
 
 Other complications are that the server has to maintain concurrent
 integrity - that is, another transaction must be able to see either none
 of the changes or all of them. As a consequence of this, Postgres needs to
 do a sequential scan through the table and mark the rows for deletion in
 the transaction, before flipping the transaction committed status and
 cleaning up afterwards.
 
 I'd be interested in how mysql manages to delete a whole load of rows in
 0.02 seconds. How many rows is that?
Afair mysql detects that case and converts it into some truncate equivalent.

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-27 Thread Віталій Тимчишин
2010/1/26 Matthew Wakeling matt...@flymine.org

 On Tue, 26 Jan 2010, Richard Neill wrote:

 SELECT SUM (case when id  120 and id  121 then 1 else 0 end)
 from tbl_tracker;

 Explain shows that this does a sequential scan.


 I'd defer to Tom on this one, but really, for Postgres to work this out, it
 would have to peer deep into the mysterious SUM function, and realise that
 the number zero is a noop. I suppose it would be possible, but you'd have to
 define noops for each of the different possible functions, *and* make the
 planner clever enough to spot the noop-matching number in the else and
 convert the WHEN into a WHERE.

 Hello.

How  about SELECT SUM (case when id  120 and id  121 then 1 end)
from tbl_tracker;
It gives same result (may be unless there are no records at all) and
optimizer already knows it need not to call function for null input. Such an
optimization would cover much more cases. It would look like:
 * Check only for aggregate subselects
 * All the functions should be noop for null input
 * Add ORed constraint for every function input is not null (in this example
(case when id  A1 and id  B1 then 1 end is not null) or (case when id  A2
and id  B2 then 1 end is not null) or ... or (case when id  An and id  Bn
then 1 end is not null)
 * Know special case (case when id  A1 and id  B1 then 1 end is not
null) = (id  A1 and id  B1)
by ORing all the when conditions case when C1 then D1 when C2 then D2 ...
when Cm then Dm end is not null = C1 or C2 or ... or Cm.
Event without last part it may give bonuses even for select count(field)
from table transformed into select count(field) from table where field is
not null and using [partial] indexes.
As of last *, replacing COUNT with SUM(CASE()) is used often enough when
multiple count calculations are needed.

Best regards, Vitalii Tymchyshyn


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-27 Thread Matthew Wakeling

On Wed, 27 Jan 2010, Віталій Тимчишин wrote:

How  about SELECT SUM (case when id  120 and id  121 then 1 end)
from tbl_tracker;


That is very interesting.


* All the functions should be noop for null input


Alas, not true for COUNT(*), AVG(), etc.

Matthew

--
An optimist sees the glass as half full, a pessimist as half empty,
and an engineer as having redundant storage capacity.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-27 Thread Віталій Тимчишин
27 січня 2010 р. 19:01 Matthew Wakeling matt...@flymine.org написав:

 On Wed, 27 Jan 2010, Віталій Тимчишин wrote:

 How  about SELECT SUM (case when id  120 and id  121 then 1 end)
 from tbl_tracker;


 That is very interesting.


  * All the functions should be noop for null input


 Alas, not true for COUNT(*), AVG(), etc.

 select avg(b), count(b), count(*) from (values (2),(null))a(b)
gives  (2.0, 1, 2) for me, so AVG is in game. Sure, it won't work for
count(*), but optimizer already knows which aggregates are strict and which
are not, so no new information is needed.

Best regards, Vitalii Tymchyshyn


Re: [PERFORM] Benchmark shows very slow bulk delete

2010-01-27 Thread James Mansion

Ivan Voras wrote:
I wish that, when people got the idea to run a simplistic benchmark 
like this, they would at least have the common sense to put the 
database on a RAM drive to avoid problems with different cylinder 
speeds of rotational media and fragmentation from multiple runs.

Huh?

It's tough to benchmark anything involving rotational drives :)
But - how the database organises its IO to maximise the available 
bandwidth, limit
avaiodable seeks, and limit avoidable flushes is absolutely key to 
realistic performance,
especially on modest everyday hardware. Not everyone has a usage that 
justifies
'enterprise' kit - but plenty of people can benefit from something a 
step up from

SQLite.

If you just want to benchmark query processor efficiency then that's one 
scenario
where taking physical IO out of the picture might be justified, but I 
don't see a good
reason to suggest that it is 'common sense' to do so for all testing, 
and while the

hardware involved is pretty low end, its still a valid data point.
.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark shows very slow bulk delete

2010-01-27 Thread Greg Smith

Kevin Grittner wrote:

It is if you don't have an index on the table which has a foreign
key defined which references the table in which you're doing
deletes.  The author of the benchmark apparently didn't realize that
MySQL automatically adds such an index to the dependent table, while
PostgreSQL leaves it to you to decide whether to add such an index. 
  


The author there didn't write the PostgreSQL schema; he's just using the 
osdb test kit:  http://osdb.sourceforge.net/


Given that both Peter and Neil Conway have thrown work their way, I know 
there's been some PG specific work done on that project by people who 
know what's going on, but I'm not sure if that included a performance 
check.  A quick glance at 
http://osdb.cvs.sourceforge.net/viewvc/osdb/osdb/src/callable-sql/postgres-ui/osdb-pg-ui.m4?revision=1.4view=markup 
finds this:


 222 createIndexForeign(char* tName, char* keyName, char* keyCol,
 223 char* fTable, char* fFields) {
 224 snprintf(cmd, CMDBUFLEN,
 225 alter table %s add constraint %s foreign key (%s) references %s 
(%s),
 226 tName, keyName, keyCol, fTable, fFields);


But I don't see any obvious spot where the matching index that should go 
along with that is created at.  The code is just convoluted enough (due 
to how they abstract away support for multiple databases) that I'm not 
sure yet--maybe they call their createIndexBtree function and fix this 
in a later step.  But the way the function is 
named--createIndexForeign--seems to suggest they believe that this 
operation will create the index, too, which as you point out is just not 
true.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance