Re: [PERFORM] Raid 5 vs Raid 10 Benchmarks Using bonnie++

2011-08-17 Thread Gary Doades

On 17/08/2011 7:26 PM, Ogden wrote:
I am using bonnie++ to benchmark our current Postgres system (on RAID 
5) with the new one we have, which I have configured with RAID 10. The 
drives are the same (SAS 15K). I tried the new system with ext3 and 
then XFS but the results seem really outrageous as compared to the 
current system, or am I reading things wrong?


The benchmark results are here:

http://malekkoheavyindustry.com/benchmark.html

The results are not completely outrageous, however you don't say what 
drives, how many and what RAID controller you have in the current and 
new systems. You might expect that performance from 10/12 disks in RAID 
10 with a good controller. I would say that your current system is 
outrageous in that is is so slow!


Cheers,
Gary.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Raid 5 vs Raid 10 Benchmarks Using bonnie++

2011-08-17 Thread Gary Doades

On 17/08/2011 7:56 PM, Ogden wrote:

On Aug 17, 2011, at 1:33 PM, Gary Doades wrote:


On 17/08/2011 7:26 PM, Ogden wrote:

I am using bonnie++ to benchmark our current Postgres system (on RAID 5) with 
the new one we have, which I have configured with RAID 10. The drives are the 
same (SAS 15K). I tried the new system with ext3 and then XFS but the results 
seem really outrageous as compared to the current system, or am I reading 
things wrong?

The benchmark results are here:

http://malekkoheavyindustry.com/benchmark.html


The results are not completely outrageous, however you don't say what drives, 
how many and what RAID controller you have in the current and new systems. You 
might expect that performance from 10/12 disks in RAID 10 with a good 
controller. I would say that your current system is outrageous in that is is so 
slow!

Cheers,
Gary.


Yes, under heavy writes the load would shoot right up which is what caused us 
to look at upgrading. If it is the RAID 5, it is mind boggling that it could be 
that much of a difference. I expected a difference, now that much.

The new system has 6 drives, 300Gb 15K SAS and I've put them into a RAID 10 
configuration. The current system is ext3 with RAID 5 over 4 disks on a Perc/5i 
controller which has half the write cache as the new one (256 Mb vs 512Mb).
Hmm... for only 6 disks in RAID 10 I would say that the figures are a 
bit higher than I would expect. The PERC 5 controller is pretty poor in 
my opinion, PERC 6 a lot better and the new H700's pretty good. I'm 
guessing you have a H700 in your new system.


I've just got a Dell 515 with a H700 and 8 SAS in RAID 10 and I only get 
around 600 MB/s read using ext4 and Ubuntu 10.4 server.


Like I say, your figures are not outrageous, just unexpectedly good :)

Cheers,
Gary.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Gary Doades

On 07/12/2010 7:43 PM, Andy Colson wrote:

On 12/7/2010 1:22 PM, Justin Pitts wrote:


Also, as a fair warning: mssql doesn't really care about 
transactions, but
PG really does.  Make sure all your code is properly starting and 
commiting

transactions.

-Andy


I do not understand that statement. Can you explain it a bit better?


In mssql you can write code that connects to the db, fire off updates 
and inserts, and then disconnects.  I believe mssql will keep all your 
changes, and the transaction stuff is done for you.


In PG the first statement you fire off (like an insert into for 
example) will start a transaction.  If you dont commit before you 
disconnect that transaction will be rolled back.  Even worse, if your 
program does not commit, but keeps the connection to the db open, the 
transaction will stay open too.
As far as I know both MS SQL and  and Postgres work just the same as 
regards explicit and implicit (autocommit) transactions, only the 
underlying storage/logging mechanisms are different.


Transactions shouldn't make ay real difference to the select/join 
performance being complained about though. It's already stated that the 
insert performance of postgres far exceeds SQL Server, which is my 
experience also.


As already suggested, until we see the exact table definitions including 
indexes etc. there's no real way to tell what the problem is. How many 
rows are in the second table? It really shouldn't take that much time to 
read 1000 rows unless you have a bizarrely slow hard disk.


It would be nice to eliminate any programmatic or driver influence too. 
How does the SQL select execute in enterprise manager for mssql and psql 
or pgadmin for postgres?


Cheers,
Gary.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-07 Thread Gary Doades

On 07/12/2010 9:29 PM, Tom Polak wrote:


 From EXPLAIN ANALYZE I can see the query ran much faster.
Nested Loop Left Join  (cost=0.00..138.04 rows=1001 width=1298) (actual
time=0.036..4.679 rows=1001 loops=1)
  Join Filter: (pgtemp1.state = pgtemp2.stateid)
  -   Seq Scan on pgtemp1  (cost=0.00..122.01 rows=1001 width=788)
(actual time=0.010..0.764 rows=1001 loops=1)
  -   Materialize  (cost=0.00..1.01 rows=1 width=510) (actual
time=0.000..0.001 rows=1 loops=1001)
-   Seq Scan on pgtemp2  (cost=0.00..1.01 rows=1 width=510)
(actual time=0.006..0.008 rows=1 loops=1)
Total runtime: 5.128 ms

The general question comes down to, can I expect decent perfomance from
Postgresql compared to MSSQL.  I was hoping that Postgresql 9.0 beat MSSQL
2000 since MS 2000 is over 10 years old.

So postgres actually executed the select in around 5 miiliseconds. 
Pretty good I would say. The problem therefore lies not with postgres 
itself, but what is done with the results afterwards? Assuming that this 
is pure local and therefore no network issues, perhaps there is a 
performance issue in this case with the Npgsql driver? Someone who knows 
more about this driver could perhaps shed some light on this?


I have used .NET (C#) with postgres before, but only using the odbc 
driver. Perhaps you could try that instead (using OdbcCommand, 
OdbcDataReader etc.).


I mainly use ruby (jruby) with postgres both under linux and Windows, 
but I can certainly process 1000 records of similar structure in well 
under 1 second.


Cheers,
Gary.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizer showing wrong rows in plan

2010-03-28 Thread Gary Doades

On 28/03/2010 10:07 AM, Tadipathri Raghu wrote:

Hi All,
I want to give some more light on this by analysing more like this
1. In my example I have created a table with one column as INT( which 
occupies 4 bytes)

2. Initially it occupies one page of  space on the file that is (8kb).
So, here is it assuming these many rows may fit in this page. Clarify 
me on this Please.


Like I said, it's just a guess. With no statistics all postgres can do 
is guess, or in this case use the in-built default for a newly created 
table. It could guess 1 or it could guess 10,000,000. What it does is 
produce a reasonable guess in the absence of any other information.


You should read the postgres documentation for further information about 
statistics and how the optimizer uses them.



Regards,
Gary.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DB is slow until DB is reloaded

2010-01-04 Thread Gary Doades

On 04/01/2010 7:10 PM, Madison Kelly wrote:

Hi all,

  I've got a fairly small DB (~850MB when pg_dump'ed) running on PgSQL
v8.1.11 a CentOS 5.3 x86_64 Xen-based virtual machine. The filesystem is
ext3 on LVM with 32MB extents. It's about the only real resource-hungry
VM on the server.

  It slows down over time and I can't seem to find a way to get the
performance to return without doing a dump and reload of the database.
I've tried manually running 'VACUUM FULL' and restarting the postgresql
daemon without success.

For example, here is an actual query before the dump and again after the
dump (sorry for the large query):

-=] Before the dump/reload [=-
ser...@iwt= EXPLAIN ANALYZE SELECT lor_order_type, lor_order_date,
lor_order_time, lor_isp_agent_id, lor_last_modified_date,
lor_isp_order_number, lor_instr_for_bell_rep, lor_type_of_service,
lor_local_voice_provider, lor_dry_loop_instr, lor_termination_location,
lor_req_line_speed, lor_server_from, lor_rate_band,
lor_related_order_nums, lor_related_order_types, lor_activation_date,
lor_cust_first_name, lor_cust_last_name, lor_req_activation_date,
lor_street_number, lor_street_number_suffix, lor_street_name,
lor_street_type, lor_street_direction, lor_location_type_1,
lor_location_number_1, lor_location_type_2, lor_location_number_2,
lor_postal_code, lor_municipality, lor_province, lor_customer_group,
lor_daytime_phone, lor_daytime_phone_ext, lod_value AS circuit_number
FROM line_owner_report LEFT JOIN line_owner_data ON (lor_id=lod_lo_id
AND lod_variable='ISPCircuitNumber1') WHERE lor_lo_id=514;
QUERY
PLAN
-- 


 Hash Left Join  (cost=2115.43..112756.81 rows=8198 width=1152) (actual
time=1463.311..1463.380 rows=1 loops=1)
   Hash Cond: (outer.lor_id = inner.lod_lo_id)
   -  Seq Scan on line_owner_report  (cost=0.00..108509.85 rows=8198
width=1124) (actual time=1462.810..1462.872 rows=1 loops=1)
 Filter: (lor_lo_id = 514)
   -  Hash  (cost=2112.85..2112.85 rows=1033 width=36) (actual
time=0.421..0.421 rows=5 loops=1)
 -  Bitmap Heap Scan on line_owner_data  (cost=9.61..2112.85
rows=1033 width=36) (actual time=0.274..0.378 rows=5 loops=1)
   Recheck Cond: (lod_variable = 'ISPCircuitNumber1'::text)
   -  Bitmap Index Scan on lod_variable_index
(cost=0.00..9.61 rows=1033 width=0) (actual time=0.218..0.218 rows=5
loops=1)
 Index Cond: (lod_variable = 
'ISPCircuitNumber1'::text)

 Total runtime: 1463.679 ms
(10 rows)

-=] After the dump/reload [=-
ser...@iwt= EXPLAIN ANALYZE SELECT lor_order_type, lor_order_date,
lor_order_time, lor_isp_agent_id, lor_last_modified_date,
lor_isp_order_number, lor_instr_for_bell_rep, lor_type_of_service,
lor_local_voice_provider, lor_dry_loop_instr, lor_termination_location,
lor_req_line_speed, lor_server_from, lor_rate_band,
lor_related_order_nums, lor_related_order_types, lor_activation_date,
lor_cust_first_name, lor_cust_last_name, lor_req_activation_date,
lor_street_number, lor_street_number_suffix, lor_street_name,
lor_street_type, lor_street_direction, lor_location_type_1,
lor_location_number_1, lor_location_type_2, lor_location_number_2,
lor_postal_code, lor_municipality, lor_province, lor_customer_group,
lor_daytime_phone, lor_daytime_phone_ext, lod_value AS circuit_number
FROM line_owner_report LEFT JOIN line_owner_data ON (lor_id=lod_lo_id
AND lod_variable='ISPCircuitNumber1') WHERE lor_lo_id=514;
   QUERY
PLAN
- 


 Nested Loop Left Join  (cost=10.84..182.57 rows=5 width=1152) (actual
time=1.980..2.083 rows=1 loops=1)
   -  Seq Scan on line_owner_report  (cost=0.00..70.05 rows=5
width=1124) (actual time=1.388..1.485 rows=1 loops=1)
 Filter: (lor_lo_id = 514)
   -  Bitmap Heap Scan on line_owner_data  (cost=10.84..22.47 rows=3
width=36) (actual time=0.562..0.562 rows=0 loops=1)
 Recheck Cond: ((outer.lor_id = line_owner_data.lod_lo_id)
AND (line_owner_data.lod_variable = 'ISPCircuitNumber1'::text))
 -  BitmapAnd  (cost=10.84..10.84 rows=3 width=0) (actual
time=0.552..0.552 rows=0 loops=1)
   -  Bitmap Index Scan on lod_id_index  (cost=0.00..4.80
rows=514 width=0) (actual time=0.250..0.250 rows=126 loops=1)
 Index Cond: (outer.lor_id =
line_owner_data.lod_lo_id)
   -  Bitmap Index Scan on lod_variable_index
(cost=0.00..5.80 rows=514 width=0) (actual time=0.262..0.262 rows=5 
loops=1)
 Index Cond: (lod_variable = 
'ISPCircuitNumber1'::text)

 Total runtime: 2.576 ms
(11 rows)

  Any idea on what might be causing the slowdown? Is it likely
filesystem related or am I missing for 

Re: [PERFORM] DB is slow until DB is reloaded

2010-01-04 Thread Gary Doades



On 04/01/2010 8:30 PM, Madison Kelly wrote:

Steve Crawford wrote:

Madison Kelly wrote:

Hi all,

  I've got a fairly small DB...

  It slows down over time and I can't seem to find a way to get the
performance to return without doing a dump and reload of the 
database...


Some questions:

Is autovacuum running? This is the most likely suspect. If not, 
things will bloat and you won't be getting appropriate analyze 
runs. Speaking of which, what happens if you just run analyze?


And as long as you are dumping and reloading anyway, how about 
version upgrading for bug reduction, performance improvement, and 
cool new features.


Cheers,
Steve



Yup, I even tried manually running 'VACUUM FULL' and it didn't help. 
As for upgrading;


VACUUM FULL is not the same as VACUUM ANALYZE FULL. You shouldn't need 
the FULL option amyway.
a) I am trying to find a way around the dump/reload. I am doing it as 
a last resort only.

b) I want to keep the version in CentOS' repo.

I'd not tried simply updating the stats via ANALYZE... I'll keep an 
eye on performance and if it starts to slip again, I will run ANALYZE 
and see if that helps. If there is a way to run ANALYZE against a 
query that I am missing, please let me know.


From your queries it definitely looks like its your stats that are the 
problem. When the stats get well out of date the planner is choosing a 
hash join because it thinks thousands of rows are involved where as only 
a few are actually involved. Thats why, with better stats, the second 
query is using a loop join over very few rows and running much quicker.


Therefore it's ANALYZE you need to run as well as regular VACUUMing. 
There should be no need to VACUUM FULL at all as long as you VACUUM and 
ANALYZE regularly. Once a day may be enough, but you don't say how long 
it takes your database to become slow.


You can VACUUM either the whole database (often easiest) or individual 
tables if you know in more detail what the problem is and that only 
certain tables need it.


Setting up autovacuum may well be sufficient.

Cheers,
Gary.







Madi



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller

2009-02-04 Thread Gary Doades

Rajesh Kumar Mallah wrote:

Hi,

I am going to get a Dell 2950 with PERC6i with
8 * 73 15K SAS drives +
300 GB EMC SATA SAN STORAGE,

I seek suggestions from users sharing their experience with
similar hardware if any. I have following specific concerns.

1. On list i read  that RAID10 function in PERC5 is not really
   striping but spanning and does not give performance boost
   is it still true in case of PERC6i ?


It's long been our policy to buy Dell servers and I agree with most 
people here that the performance of the PERCs (5 and earlier) have been 
generally pretty poor


However, they seem to have listened and got it right, or at least a lot 
better, with the PERC6.


I have recently installed Ubuntu server on 2 Dell 2950s with 8GB RAM and 
six 2.5 inch 15K rpm SAS disks in a single RAID10.


I only got chance to run bonnie++ on them a few times, but I was 
consistently getting around 200MB/sec for both sequential read and write 
(16GB file).


Similar setup with the older Dell 2850 (PERC5, 6 x 15K rpm 3.5 inch 
SCSI) gave only around 120GB/sec whatever I did.


Hope this helps.

Cheers,
Gary.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Gary Doades
Tom Lane wrote:
 I increased the size of the test case by 10x (basically s/10/100/)
 which is enough to push it into the external-sort regime.  I get
 amazingly stable runtimes now --- I didn't have the patience to run 100
 trials, but in 30 trials I have slowest 11538 msec and fastest 11144 msec.
 So this code path is definitely not very sensitive to this data
 distribution.

 While these numbers aren't glittering in comparison to the best-case
 qsort times (~450 msec to sort 10% as much data), they are sure a lot
 better than the worst-case times.  So maybe a workaround for you is
 to decrease maintenance_work_mem, counterintuitive though that be.
 (Now, if you *weren't* using maintenance_work_mem of 100MB or more
 for your problem restore, then I'm not sure I know what's going on...)


Good call. I basically reversed your test by keeping the number of rows
the same (20), but reducing maintenance_work_mem. Reducing to 8192
made no real difference. Reducing to 4096 flattened out all the times
nicely. Slower overall, but at least predictable. Hopefully only a
temporary solution until qsort is fixed.

My restore now takes 22 minutes :)

I think the reason I wasn't seeing performance issues with normal sort
operations is because they use work_mem not maintenance_work_mem which was
only set to 2048 anyway. Does that sound right?

Regards,
Gary.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Gary Doades
 Gary Doades [EMAIL PROTECTED] writes:
 I think the reason I wasn't seeing performance issues with normal sort
 operations is because they use work_mem not maintenance_work_mem which
 was
 only set to 2048 anyway. Does that sound right?

 Very probable.  Do you want to test the theory by jacking that up?  ;-)

Hmm, played around a bit. I have managed to get it to do a sort on one of
the bad columns using a select of two whole tables that results in a
sequntial scan, sort and merge join. I also tried a simple select column
order by column for a bad column.

I tried varying maintenance_work_mem and work_mem up and down between 2048
and 65536 but I always get similar results. The sort phase always takes 4
to 5 seconds which seems about right for 900,000 rows.

This was on a colunm that took 12 minutes to create an index on.

I've no idea why it should behave this way, but probably explains why I
(and others) may not have noticed it before.

Regards,
Gary.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Gary Doades

Tom Lane wrote:

Interesting.  I tried your test script and got fairly close times
for all the cases on two different machines:
old HPUX machine: shortest 5800 msec, longest 7960 msec
new Fedora 4 machine: shortest 461 msec, longest 608 msec
(the HPUX machine was doing other stuff at the same time, so some
of its variation is probably only noise).

So what this looks like to me is a corner case that FreeBSD's qsort
fails to handle well.

You might try forcing Postgres to use our private copy of qsort, as we
do on Solaris for similar reasons.  (The easy way to do this by hand
is to configure as normal, then alter the LIBOBJS setting in
src/Makefile.global to add qsort.o, then proceed with normal build.)
However, I think that our private copy is descended from *BSD sources,
so it might have the same failure mode.  It'd be worth finding out.

The final interesting thing is that as I increase shared buffers to 2000 
or 3000 the problem gets *worse*


shared_buffers is unlikely to impact index build time noticeably in
recent PG releases.  maintenance_work_mem would affect it a lot, though.
What setting were you using for that?

Can anyone else try these test cases on other platforms?



Thanks for that.

I've since tried it on Windows (pg 8.1.2) and the times were all 
similar, around 1200ms so it might just be BSD.


I'll have to wait until tomorrow to get back to my BSD box. FreeBSD 
ports makes it easy to install, so I'll have to figure out how to get in 
and change things manually. I guess the appropriate files are still left 
around after the ports make command finishes, so I just edit the file 
and make again?


If it can't be fixed though I guess we may have a problem using BSD. I'm 
surprised this hasn't been brought up before, the case doesn't seem 
*that* rare. Maybe not that many using FreeBSD?


I'd certainly be interested if anyone else can repro it on FreeBSD though.

Regards,
Gary.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Gary Doades

Tom Lane wrote:

shared_buffers is unlikely to impact index build time noticeably in
recent PG releases.  maintenance_work_mem would affect it a lot, though.
What setting were you using for that?



Also, i tried upping maintenance_work_mem to 65536 and it didn't make 
much difference (maybe 10% faster for the normal cases). Upping the 
shared_buffers *definitely* makes the bad cases worse though, but I 
agree I don't see why...


Regards,
Gary.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Gary Doades

Tom Lane wrote:
  I tried forcing PG to use src/port/qsort.c on the Fedora machine,

and lo and behold:
new Fedora 4 machine: shortest 434 msec, longest 8530 msec

So it sure looks like this script does expose a problem on BSD-derived
qsorts.  Curiously, the case that's much the worst for me is the third
in the script, while the shortest time is the first case, which was slow
for Gary.  So I'd venture that the *BSD code has been tweaked somewhere
along the way, in a manner that moves the problem around without really
fixing it.  (Anyone want to compare the actual FreeBSD source to what
we have?)



If I run the script again, it is not always the first case that is slow, 
it varies from run to run, which is why I repeated it quite a few times 
for the test.


Interestingly, if I don't delete the table after a run, but just drop 
and re-create the index repeatedly it stays a pretty consistent time, 
either repeatedly good or repeatedly bad!


Regards,
Gary.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Strange Create Index behaviour

2006-02-15 Thread Gary Doades

Tom Lane wrote:


So it sure looks like this script does expose a problem on BSD-derived
qsorts.  Curiously, the case that's much the worst for me is the third
in the script, while the shortest time is the first case, which was slow
for Gary.  So I'd venture that the *BSD code has been tweaked somewhere
along the way, in a manner that moves the problem around without really
fixing it.  (Anyone want to compare the actual FreeBSD source to what
we have?)

It's really interesting to see a case where port/qsort is radically
worse than other qsorts ... unless we figure that out and fix it,
I think the idea of using port/qsort everywhere has just taken a
major hit.



More specifically to BSD, is there any way I can use a non-BSD qsort for 
building Postresql server?


Regards,
Gary.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-11 Thread Gary Doades
Dave Cramer wrote:
I understand that but I have seen VM's crash.
This does bring up another point. Since postgresql is not threaded a 
.NET pl would require a separate VM for each connection (unless you can 
share the vm ?). One of the java pl's (pl-j)  for postgres has dealt 
with this issue.
For a hundred connections that's a hundred .NET vm's or java vm's.

Is the .NET VM shareable ?
In Windows, most certainly. Not sure about mono.
Cheers,
Gary.
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-10 Thread Gary Doades
Pierre-Frédéric Caillaud wrote:
On Mon, 10 Jan 2005 12:46:01 -0500, Alex Turner [EMAIL PROTECTED] wrote:
You sir are correct!  You can't use perl in MS-SQL or Oracle ;).

Can you benefit from the luminous power of Visual Basic as a pl in 
MSSQL ?

The .NET Runtime will be a part of the next MS SQLServer engine. You 
will be able to have C# as a pl in the database engine with the next 
version of MSSQL. That certainly will be something to think about.

Cheers,
Gary.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-10 Thread Gary Doades
Rosser Schwarz wrote:
while you weren't looking, Gary Doades wrote:

The .NET Runtime will be a part of the next MS SQLServer engine.

It won't be long before someone writes a procedural language binding
to PostgreSQL for Parrot [1].  That should offer us a handful or six
more languages that can be used, including BASIC, Ruby and Scheme,
Perl (5 and 6), Python and TCL for more or less free, and ... wait for
it, BrainF***.
IIRC, people have talked about porting C# to Parrot, as well.
Or perhaps get the mono engine in there somewhere to pick up another 
dozen or so languages supported by .NET and mono..

Cheers,
Gary.
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-10 Thread Gary Doades
Dave Cramer wrote:
I'm curious, why do you think that's serious ? What do you really expect 
to do in the stored procedure ? Anything of consequence will seriously 
degrade performance if you select it in say a million rows.

I'm not sure what you mean by select it in a million rows. I would 
expect to write a procedure within the database engine to select a 
million rows, process them and return the result to the client. Very 
efficient.

Cheers,
Gary.
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-10 Thread Gary Doades
Dave Cramer wrote:
Ok, so one use case is to select a large number of rows and do some 
non-trivial operation on them.
I can see where getting the rows inside the server process ( ie some 
procedural language ) thereby reducing the round trip overhead would be 
beneficial. However how do you deal with the lack of control ? For 
instance what happens if you run out of memory while doing this ? I'm 
not sure about other DB'S but if you crash the procedural language 
inside postgres you will bring the server down.

It would seem to me that any non-trivial operation would be better 
handled outside the server process, even if it costs you the round trip.
Since a .NET language is operating effectively inside a VM it is pretty 
much impossible to bring down the server that way. Only a bug in the 
.NET runtime itself will do that. The C# try/catch/finally with .NET 
global execption last chance handlers will ensure the server and your 
code is well protected.

Cheers,
Gary.
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-09 Thread Gary Doades
Randolf Richardson wrote:
	I'm looking for recent performance statistics on PostgreSQL vs. Oracle 
vs. Microsoft SQL Server.  Recently someone has been trying to convince my 
client to switch from SyBASE to Microsoft SQL Server (they originally wanted 
to go with Oracle but have since fallen in love with Microsoft).  All this 
time I've been recommending PostgreSQL for cost and stability (my own testing 
has shown it to be better at handling abnormal shutdowns and using fewer 
system resources) in addition to true cross-platform compatibility.

I'm not sure that you are going to get a simple answer to this one. It
really depends on what you are trying to do. The only way you will know
for sure what the performance of PostgreSQL is is to try it with samples
of your common queries, updates etc.
I have recently ported a moderately complex database from MS SQLServer
to Postgres with reasonable success. 70% selects, 20% updates, 10%
insert/deletes. I had to do a fair bit of work to get the best
performance out of Postgres, but most of the SQL has as good or better
performance then SQLServer. There are still areas where SQLServer
outperforms Postgres. For me these tend to be the larger SQL Statements
with correlated subqueries. SQLServer tends to optimise them better a
lot of the time. Updates tend to be a fair bit faster on SQLServer too,
this may be MS taking advantage of Windows specific optimisations in the
filesystem.
I did give Oracle a try out of curiosity. I never considered it
seriously because of the cost. The majority of my SQL was *slower* under
Oracle than SQLServer. I spent some time with it and did get good
performance, but it took a *lot* of work tuning to Oracle specific ways
of doing things.
My Summary:
SQLServer: A good all round database, fast, stable. Moderately expensive
to buy, cheap and easy to work with and program for (on Windows)
PostgreSQL: A good all rounder, fast most of the time, stable. Free to
acquire, more expensive to work with and program for. Client drivers may
be problematic depending on platform and programming language. Needs
more work than SQLServer to get the best out of it. Improving all the
time and worth serious consideration.
Oracle: A bit of a monstrosity. Can be very fast with a lot of work,
can't comment on stability but I guess it's pretty good. Very expensive
to acquire and work with. Well supported server and clients.
Cheers,
Gary.
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Postgres backend using huge amounts of ram

2004-11-26 Thread Gary Doades
Tom Lane wrote:
It's also worth noting that work_mem is temporarily set to
maintenance_work_mem, which you didn't tell us the value of:
It's left at the default. (16384).
This would be OK if that is all it used for this type of thing.

My recollection is that hash join chooses hash table partitions partly
on the basis of the estimated number of input rows.  Since the estimate
was way off, the actual table size got out of hand a bit :-(
A bit!!
The really worrying bit is that a normal (ish) query also exhibited the 
same behaviour. I'm a bit worried that if the stats get a bit out of 
date so that the estimate is off, as in this case, a few backends trying 
to get this much RAM will see the server grind to a halt.

Is this a fixable bug? It seems a fairly high priority, makes the server 
go away, type bug to me.

If you need the test data, I could zip the two tables up and send them 
somewhere

Thanks,
Gary.
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Postgres backend using huge amounts of ram

2004-11-25 Thread Gary Doades
How much RAM can a single postgres backend use?
I've just loaded a moderately sized dataset into postgres and was
applying RI constraints to the tables (using pgadmin on windows). Part
way though I noticed the (single) postgres backend had shot up to using
300+ MB of my RAM!
The two tables are:
create table reqt_dates
(
reqt_date_idserial,
reqt_id integer not null,
reqt_date   date not null,
primary key (reqt_date_id)
) without oids;
and
create table booking_plan
(
booking_plan_id serial,
reqt_date_idinteger not null,
booking_id  integer not null,
booking_datedate not null,
datetime_from   timestamp not null,
datetime_to timestamp not null,
primary key (booking_plan_id)
) without oids;
and I was was trying to do:
alter table booking_plan add
 foreign key
(
reqt_date_id
) references reqt_dates (
reqt_date_id
) on delete cascade;
Since I can't get an explain of what the alter table was doing I used this:
select count(*) from booking_plan,reqt_dates where
booking_plan.reqt_date_id = reqt_dates.reqt_date_id
and sure enough this query caused the backend to use 300M RAM. The plan
for this was:
QUERY PLAN
Aggregate  (cost=37.00..37.00 rows=1 width=0) (actual
time=123968.000..123968.000 rows=1 loops=1)
  -  Hash Join  (cost=15.50..36.50 rows=1000 width=0) (actual
time=10205.000..120683.000 rows=1657709 loops=1)
Hash Cond: (outer.reqt_date_id = inner.reqt_date_id)
-  Seq Scan on booking_plan  (cost=0.00..15.00 rows=1000
width=4) (actual time=10.000..4264.000 rows=1657709 loops=1)
-  Hash  (cost=15.00..15.00 rows=1000 width=4) (actual
time=10195.000..10195.000 rows=0 loops=1)
  -  Seq Scan on reqt_dates  (cost=0.00..15.00 rows=1000
width=4) (actual time=0.000..6607.000 rows=2142184 loops=1)
Total runtime: 124068.000 ms
I then analysed the database. Note, there are no indexes at this stage
except the primary keys.
the same query then gave:
QUERY PLAN
Aggregate  (cost=107213.17..107213.17 rows=1 width=0) (actual
time=57002.000..57002.000 rows=1 loops=1)
  -  Hash Join  (cost=35887.01..106384.32 rows=1657709 width=0)
(actual time=9774.000..54046.000 rows=1657709 loops=1)
Hash Cond: (outer.reqt_date_id = inner.reqt_date_id)
-  Seq Scan on booking_plan  (cost=0.00..22103.55 rows=1657709
width=4) (actual time=10.000..19648.000 rows=1657709 loops=1)
-  Hash  (cost=24355.92..24355.92 rows=2142184 width=4)
(actual time=9674.000..9674.000 rows=0 loops=1)
  -  Seq Scan on reqt_dates  (cost=0.00..24355.92
rows=2142184 width=4) (actual time=0.000..4699.000 rows=2142184 loops=1)
Total runtime: 57002.000 ms
This is the same set of hash joins, BUT the backend only used 30M of
private RAM.
Platform is Windows XP, Postgres 8.0 beta 5
shared_buffers = 4000
work_mem = 8192
Any explanations?
Thanks,
Gary.
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance Anomalies in 7.4.5

2004-10-21 Thread Gary Doades
On 21 Oct 2004 at 15:50, Thomas F.O'Connell wrote:

 If not, should I be REINDEXing manually, as well as VACUUMing manually  
 after large data imports (whether via COPY or INSERT)? Or will a VACUUM  
 FULL ANALYZE be enough?
 

It's not the vacuuming that's important here, just the analyze. If you import any data 
into 
a table, Postgres often does not *know* that until you gather the statistics on the 
table.
You are simply running into the problem of the planner not knowing how much 
data/distribution of data in your tables.

If you have large imports it may be faster overall to drop the indexes first, then 
insert the 
data, then put the indexes back on, then analyze.

Cheers,
Gary.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Odd planner choice?

2004-10-08 Thread Gary Doades
Oops, forgot to mention:

PostgreSQL 8.0 beta 2 Windows.

Thanks,
Gary.

On 8 Oct 2004 at 20:32, Gary Doades wrote:

 
 I'm looking at one of my standard queries and have encountered some strange 
 performance 
 problems.
 
 The query below is to search for vacant staff member date/time slots given a series 
 of target 
 date/times. The data contained in the booking_plan/staff_booking tables contain the 
 existing 
 bookings, so I'm looking for clashing bookings to eliminate them from a candidate 
 list.
 
 The query is:
 
 select distinct b.staff_id from staff_booking b, booking_plan bp, 
 t_search_reqt_dates rd
 where b.booking_id = bp.booking_id
 and rd.datetime_from = bp.datetime_to and rd.datetime_to = bp.datetime_from
 AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1
 and rd.search_id = 13
 and rd.reqt_date between '2004-09-30' AND '2005-12-31'
 
 There are 197877 rows in staff_booking, 573416 rows in booking_plan and 26 rows in 
 t_search_reqt_dates.
 
 The t_search reqt_dates is a temp table created and populated with the target 
 date/times. The 
 temp table is *not* analyzed, all the other are.
 
 The good query plan comes with the criteria on search_id and reqt_date given in 
 the last two 
 lines in the query. Note all the rows in the temp table are search_id = 13 and all 
 the rows are 
 between the two dates, so the whole 26 rows is always pulled out.
 
 In this case it is doing exactly what I expect. It is pulling all rows from the 
 t_search_reqt_dates 
 table, then pulling the relevant records from the booking_plan and then hashing with 
 staff_booking. Excellent performance.
 
 The problem is I don't need the clauses for search_id and reqt_dates as the whole 
 table is 
 always read anyway. The good plan is because the planner thinks just one row will be 
 read from 
 t_search_reqt_dates.
 
 If I remove the redundant clauses, the planner now estimates 1000 rows returned from 
 the table, 
 not unreasonable since it has no statistics. But *why* in that case, with *more* 
 estimated rows 
 does it choose to materialize that table (26 rows) 573416 times!!!
 
 whenever it estimates more than one row it chooses the bad plan.
 
 I really want to remove the redundant clauses, but I can't. If I analyse the table, 
 then it knows 
 there are 26 rows and chooses the bad plan whatever I do.
 
 Any ideas???
 
 Cheers,
 Gary.
 
  Plans for above query 
 
 Good QUERY PLAN
 Unique (cost=15440.83..15447.91 rows=462 width=4) (actual time=1342.000..1342.000 
 rows=110 loops=1)
  - Sort (cost=15440.83..15444.37 rows=7081 width=4) (actual time=1342.000..1342.000 
 rows=2173 loops=1)
  Sort Key: b.staff_id
  - Hash Join (cost=10784.66..15350.26 rows=7081 width=4) (actual 
 time=601.000..1331.000 rows=2173 loops=1)
  Hash Cond: (outer.booking_id = inner.booking_id)
  - Seq Scan on staff_booking b (cost=0.00..4233.39 rows=197877 width=8) (actual 
 time=0.000..400.000 rows=197877 loops=1)
  - Hash (cost=10781.12..10781.12 rows=7080 width=4) (actual 
 time=591.000..591.000 rows=0 loops=1)
  - Nested Loop (cost=0.00..10781.12 rows=7080 width=4) (actual 
 time=10.000..581.000 rows=2173 loops=1)
  Join Filter: ((outer.datetime_from = inner.datetime_to) AND 
 (outer.datetime_to = inner.datetime_from))
  - Seq Scan on t_search_reqt_dates rd (cost=0.00..16.50 rows=1 width=20) 
 (actual time=0.000..0.000 rows=26 loops=1)
  Filter: ((search_id = 13) AND (reqt_date = '2004-09-30'::date) AND 
 (reqt_date = '2005-12-31'::date))
  - Index Scan using booking_plan_idx2 on booking_plan bp 
 (cost=0.00..10254.91 rows=63713 width=24) (actual time=0.000..11.538 rows=5871 
 loops=26)
  Index Cond: ((bp.booking_date = (outer.reqt_date - 1)) AND 
 (bp.booking_date = (outer.reqt_date + 1)))
 Total runtime: 1342.000 ms
 
 
 Bad QUERY PLAN
 Unique (cost=7878387.29..7885466.50 rows=462 width=4) (actual 
 time=41980.000..41980.000 
 rows=110 loops=1)
  - Sort (cost=7878387.29..7881926.90 rows=7079211 width=4) (actual 
 time=41980.000..41980.000 rows=2173 loops=1)
  Sort Key: b.staff_id
  - Nested Loop (cost=5314.32..7480762.73 rows=7079211 width=4) (actual 
 time=6579.000..41980.000 rows=2173 loops=1)
  Join Filter: ((inner.datetime_from = outer.datetime_to) AND 
 (inner.datetime_to = 
 outer.datetime_from) AND (outer.booking_date = (inner.reqt_date - 1)) AND 
 (outer.booking_date = (inner.reqt_date + 1)))
  - Hash Join (cost=5299.32..26339.73 rows=573416 width=24) (actual 
 time=2413.000..7832.000 rows=573416 loops=1)
  Hash Cond: (outer.booking_id = inner.booking_id)
  - Seq Scan on booking_plan bp (cost=0.00..7646.08 rows=573416 width=24) 
 (actual time=0.000..1201.000 rows=573416 loops=1)
  - Hash (cost=4233.39..4233.39 rows=197877 width=8) (actual 
 time=811.000..811.000 rows=0 loops=1)
  - Seq Scan on staff_booking b (cost=0.00..4233.39 rows=197877 width=8) 
 (actual time=0.000..430.000 rows=197877 loops=1)
  - Materialize (cost=15.00..20.00 rows=1000 width=20) (actual time

Re: [PERFORM] Odd planner choice?

2004-10-08 Thread Gary Doades
On 8 Oct 2004 at 16:04, Tom Lane wrote:

 Gary Doades [EMAIL PROTECTED] writes:
  If I remove the redundant clauses, the planner now estimates 1000 rows returned 
  from 
  the table, not unreasonable since it has no statistics. But *why* in that case, 
  with *more* 
  estimated rows does it choose to materialize that table (26 rows) 573416 times!!!
 
 It isn't.  It's materializing that once and scanning it 573416 times,
 once for each row in the outer relation.  And this is not a bad plan
 given the estimates.  If it had stuck to what you call the good plan,
 and there *had* been 1000 rows in the temp table, that plan would have
 run 1000 times longer than it did.
 
 As a general rule, if your complaint is that you get a bad plan for an
 unanalyzed table, the response is going to be so analyze the table.
 

The problem is in this case is that if I *do* analyse the table I *always* get the bad 
plan. 
Bad in this case meaning the query takes a lot longer. I'm still not sure why it can't 
choose the better plan by just reading the 26 rows once and index scan the 
booking_plan table 26 times (as in the good plan).

OK, with 1000 row estimate I can see that index scanning 1000 times into the 
booking_plan table would take some time, but the even if planner estimates 5 rows it 
still 
produces the same slow query.

If I analyze the table it then knows there are 26 rows and therefore always goes slow.

This is why I am not analyzing this table, to fool the planner into thinking there is 
only 
one row and produce a much faster access plan. Not ideal I know.

Just using one redundant clause I now get:

select distinct b.staff_id  from staff_booking b, booking_plan bp, t_search_reqt_dates 
rd
where b.booking_id = bp.booking_id
and rd.datetime_from = bp.datetime_to and rd.datetime_to = bp.datetime_from
AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1
and rd.search_id = 13

QUERY PLAN
Unique  (cost=50885.97..50921.37 rows=462 width=4) (actual 
time=35231.000..35241.000 rows=110 loops=1)
  -  Sort  (cost=50885.97..50903.67 rows=35397 width=4) (actual 
time=35231.000..35241.000 rows=2173 loops=1)
Sort Key: b.staff_id
-  Hash Join  (cost=44951.32..50351.07 rows=35397 width=4) (actual 
time=34530.000..35231.000 rows=2173 loops=1)
  Hash Cond: (outer.booking_id = inner.booking_id)
  -  Seq Scan on staff_booking b  (cost=0.00..4233.39 rows=197877 
width=8) 
(actual time=0.000..351.000 rows=197877 loops=1)
  -  Hash  (cost=44933.62..44933.62 rows=35397 width=4) (actual 
time=34530.000..34530.000 rows=0 loops=1)
-  Nested Loop  (cost=15.50..44933.62 rows=35397 width=4) (actual 
time=8342.000..34520.000 rows=2173 loops=1)
  Join Filter: ((inner.datetime_from = outer.datetime_to) 
AND 
(inner.datetime_to = outer.datetime_from) AND (outer.booking_date = 
(inner.reqt_date - 1)) AND (outer.booking_date = (inner.reqt_date + 1)))
  -  Seq Scan on booking_plan bp  (cost=0.00..7646.08 
rows=573416 
width=24) (actual time=0.000..1053.000 rows=573416 loops=1)
  -  Materialize  (cost=15.50..15.53 rows=5 width=20) (actual 
time=0.001..0.019 rows=26 loops=573416)
-  Seq Scan on t_search_reqt_dates rd  
(cost=0.00..15.50 rows=5 
width=20) (actual time=0.000..0.000 rows=26 loops=1)
  Filter: (search_id = 13)
Total runtime: 35241.000 ms

If this is the only answer for now, then fair enough I will just have to do more 
testing.

Regards,
Gary.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Optimizing a request

2004-08-31 Thread Gary Doades



On 31 Aug 2004 at 20:59, Jean-Max Reymond wrote:


 hi,
 
 I want to optimize the following request and avoid the seq scan on the
 table article (1000 rows).
 
 explain SELECT art_id, art_titre, art_texte, rub_titre
 FROM article inner join rubrique on article.rub_id = rubrique.rub_id
 where rub_parent = 8;
 
 Hash Join (cost=8.27..265637.59 rows=25 width=130)
 Hash Cond: (outer.rub_id = inner.rub_id)
 - Seq Scan on article (cost=0.00..215629.00 rows=1000 
width=108)
 - Hash (cost=8.26..8.26 rows=3 width=22)
 - Index Scan using rubrique_parent 
on rubrique 
 (cost=0.00..8.26 rows=3 width=22)
 Index 
Cond: (rub_parent = 8)
 
 
 thanks for your answers,
 
 -- 

Have you run ANALYZE on this database after creating the indexes or 
loading the data?


What percentage of rows in the article table are likely to match the 
keys selected from the rubrique table?


If it is likely to fetch a high proportion of the rows from article then it 
may be best that a seq scan is performed.


What are your non-default postgresql.conf settings? It may be better to 
increase the default_statistics_target (to say 100 to 
200) before running ANALYZE and then re-run the 
query.


Cheers,
Gary.





Re: [PERFORM] Optimizing a request

2004-08-31 Thread Gary Doades
On 31 Aug 2004 at 21:42, Jean-Max Reymond wrote:

 - Original Message -
 From: Gary Doades [EMAIL PROTECTED]
 Date: Tue, 31 Aug 2004 20:21:49 +0100
 Subject: Re: [PERFORM] Optimizing a request
 To: [EMAIL PROTECTED]
 
  
 
  Have you run ANALYZE on this database after creating the indexes or loading the 
  data? 
  
 the indexes are created and the data loaded and then, I run vacuum analyze.
 
 What percentage of rows in the article table are likely to match
 the keys selected from  the rubrique table?
  
 only 1 record.
 
 If it is likely to fetch a high proportion of the rows from article
 then it may be best that a seq scan is performed.
  
 What are your non-default postgresql.conf settings? It may be better
 to increase the default_statistics_target (to say 100 to 200) before
 running ANALYZE and then re-run the query.
  
 yes,  default_statistics_target is set to the default_value.
 I have just increased  shared_buffers and effective_cache_size to give
 advantage of 1 Mb RAM
  

I can only presume you mean 1 GB RAM. What exactly are your 
settings for shared buffers and effective_cache_size?

Can you increase default_statistics_target and re-test? It is possible 
that with such a large table that the distribution of values is skewed and 
postgres does not realise that an index scan would be better.

It seems very odd otherwise that only on row out of 10,000,000 could 
match and postgres does not realise this.

Can you post an explain analyse (not just explain) for this query?

Cheers,
Gary.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Optimizing a request

2004-08-31 Thread Gary Doades
On 31 Aug 2004 at 22:24, Jean-Max Reymond wrote:

 On Tue, 31 Aug 2004 21:16:46 +0100, Gary Doades [EMAIL PROTECTED] wrote:
 
  I can only presume you mean 1 GB RAM. What exactly are your
  settings for shared buffers and effective_cache_size?
 
 for 1 GB RAM,
 shared_buffers = 65536
 effective_cache_size = 16384 

This seems like the wrong way round also.

You might try:

shared_buffers = 1
effective_cache_size = 6

Cheers,
Gary.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-02 Thread Gary Doades
On 2 Jun 2004 at 16:45, Merlin Moncure wrote:

 
 'better' does not mean 'faster'.  Win32 has a pretty decent journaling
 filesytem (ntfs) and a good I/O subsystem which includes IPC.  Process
 management is poor compared to newer linux kernels but this is
 unimportant except in extreme cases.  Right now the win32 native does
 not sync() (but does fsync()).  So, the performance is somewhere between
 fsync = off and fsync = on (probably much closer to fsync = on).  It is
 reasonable to assume that the win32 port will outperform the unix
 versions at many tasks (at the expense of safety) until the new sync()
 code is put in.
 
 If tested on the same source base, 40-60% differences can only be coming
 from the I/O subsystem.  There are other factors which aren't clear from
 this exchange like what version of gcc, etc.
 

Hmm, interesting.

I've been running the Win32 port for a couple of weeks now. Using the 
same database as a Linux 2.6 system. Same processor and memory 
but different disks.

Linux system has 10K rpm SCSI disks
Windows has 7200 rpm serial ATA disks.

When a lot of IO is involved the performance differences are very mixed 
as I would expect. Sometimes Windows wins, sometimes Linux.

BUT, very consistently, when NO IO is involved then the Win32 port is 
always around 20% slower than Linux. In cases where the EXPLAIN 
ANALYZE results are different I have disregarded. In all the cases that 
the EXPLAIN ANALYZE results are the same and no IO is involved the 
Win32 port is slower.

Currently I am putting this down to the build/gcc differences. I can't see 
why there should be this difference otherwise. (memory 
management??)

Regards,
Gary.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL performance in simple queries

2004-05-19 Thread Gary Doades



Try using 


select * from files_t where parent = ::int8


You have declared parent as int8, but the query will assume int4 for  and may not 
use the index.


Also make sure you have ANALYZEd this table.


Regards,
Gary.


On 20 May 2004 at 0:07, Eugeny Balakhonov wrote:


 Hello for all!
 
 I have PostgreSQL 7.4 under last version of Cygwin and have some
 problems with performance :( It is very strange... I don't remember
 this problem on previous version Cygwin and PostgreSQL 7.3
 
 I have only two simple tables:
 
 CREATE TABLE public.files_t
 (
 id int8 NOT NULL,
 parent int8,
 size int8 NOT NULL,
 dir bool NOT NULL DEFAULT false,
 ctime timestamp NOT NULL,
 ftime timestamp NOT NULL DEFAULT ('now'::text)::timestamp(6) with time zone,
 name text NOT NULL,
 access varchar(10) NOT NULL,
 host int4 NOT NULL,
 uname text NOT NULL,
 CONSTRAINT pk_files_k PRIMARY KEY (id),
 CONSTRAINT fk_files_k FOREIGN KEY (parent) REFERENCES public.files_t (id) 
ON UPDATE CASCADE ON DELETE CASCADE,
 CONSTRAINT fk_hosts_k FOREIGN KEY (host) REFERENCES public.hosts_t (id) ON 
UPDATE CASCADE ON DELETE CASCADE
 ) WITH OIDS;
 
 and
 
 CREATE TABLE public.hosts_t
 (
 id int4 NOT NULL,
 ftime timestamp NOT NULL DEFAULT ('now'::text)::timestamp(6) with time zone,
 utime timestamp NOT NULL DEFAULT ('now'::text)::timestamp(6) with time zone,
 name text NOT NULL,
 address inet NOT NULL,
 CONSTRAINT pk_hosts_k PRIMARY KEY (id)
 ) WITH OIDS;
 
 Table files_t has 249259 records and table hosts_t has only 59 records.
 
 I tries to run simple query:
 
 select * from files_t where parent = 
 
 This query works 0.256 seconds! It is very big time for this small
 table!
 I have index for field parent:
 
 CREATE INDEX files_parent_idx
 ON public.files_t
 USING btree
 (parent);
 
 But if I tries to see query plan then I see following text:
 
 Seq Scan on files_t (cost=0.00..6103.89 rows=54 width=102)
 Filter: (parent = )
 
 PostgreSQL do not uses index files_parent_idx!
 
 I have enabled all options of QUERY TUNING in postgresql.conf, I
 have increased memory sizes for PostgreSQL:
 
 shared_buffers = 2000 # min 
16, at least max_connections*2, 8KB each
 sort_mem = 32768 
# min 64, size in KB
 vacuum_mem = 65536 
# min 1024, size in KB
 fsync = false 
# turns forced synchronization on or off
 checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
 enable_hashagg = true
 enable_hashjoin = true
 enable_indexscan = true
 enable_mergejoin = true
 enable_nestloop = true
 enable_seqscan = true
 enable_sort = true
 enable_tidscan = true
 geqo = true
 geqo_threshold = 22
 geqo_effort = 1
 geqo_generations = 0
 geqo_pool_size = 0 
# default based on tables in statement,
 
# range 128-1024
 geqo_selection_bias = 2.0 # range 1.5-2.0
 stats_start_collector = true
 stats_command_string = true
 stats_block_level = true
 stats_row_level = true
 stats_reset_on_server_start = false
 
 
 Please help me!
 My database has a very small size (only 249259 records) but it works
 very slowly :(
 
 Best regards
 Eugeny
 
 
 
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] 
so that your
 message can get through to the mailing list cleanly







Re: [PERFORM] planner/optimizer question

2004-04-30 Thread Gary Doades
On 29 Apr 2004 at 19:17, Tom Lane wrote:

 Josh Berkus [EMAIL PROTECTED] writes:
  Certainly the fact that MSSQL is essentially a single-user database makes 
  things easier for them.
 
 Our recent testing (cf the Xeon thread) says that the interlocking we
 do to make the world safe for multiple backends has a fairly high cost
 (at least on some hardware) compared to the rest of the work in
 scenarios where you are doing zero-I/O scans of data already in memory.
 Especially so for index scans.  I'm not sure this completely explains
 the differential that Gary is complaining about, but it could be part of
 it.  Is it really true that MSSQL doesn't support concurrent operations?
 
   regards, tom lane

As far as I am aware SQLSever supports concurrent operations. It 
certainly creates more threads for each connection. None of my 
observations of the system under load (50 ish concurrent users, 150 ish 
connections) suggest that it is serializing queries.

These tests are currentl on single processor Athlon XP 2000+ systems.

Regards,
Gary.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] planner/optimizer question

2004-04-30 Thread Gary Doades
On 30 Apr 2004 at 7:26, Dennis Bjorklund wrote:

 On Fri, 30 Apr 2004, Gary Doades wrote:

  I should have also pointed out that MSSQL reported that same index scan
  as taking 65% of the overall query time. It was just faster. The
  overall query took 103ms in MSSQL.

 Are your results based on a single client accessing the database and no
 concurrent updates?

 Would adding more clients, and maybe having some client that
 updates/inserts into the tables, still make mssql faster then pg? Maybe
 it's so simple as pg being optimized for more concurrent users then mssql?

 I'm just asking, I don't know much about the inner workings of
 mssql.

 --
 /Dennis Björklund


At the moment it is difficult to set up many clients for testing concurrent
stuff. In the past I have had several SQLServer clients under test,
mainly select queries. MSSQL can certainly execute queries while other
queries are still running in the background.

Our production app is fairly well biased towards selects. Currently it is
about 70% selects, 20% inserts, 6% deletes and 4% updates. Very few
updates are more than one row based on the primary key. Over 90% of
the time spend running SQL is in select queries.

My limited concurrent testing on Postgres gives very good performance
on updates, inserts, deletes, but it is suffering on the selects in certain
areas which why I have been concentrating my efforts on that area.

Having got similar (or the same) access plans in both Postgres and
MSSQL I was getting down to the next level of checking what was going
on when executing the already planned query.

I do have another database system I could try. Sybase SQLAnywhere.
This is not the original Sybase Entrerprise which has the same roots as
MSSQL. In the past my testing suggested that SQLAnywhere
performance was as godd or better than MSSQL. I mey try to set it up
with the same data in these tests for a more detailed comparison.

Regards,
Gary.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] planner/optimizer question

2004-04-30 Thread Gary Doades
On 30 Apr 2004 at 8:32, Jeff wrote:
 
   A better comparision query may be a simple select a from mytable 
 where a between foo and bar  to get an index scan.  In that case its a 
 straight up, vanilla index scan.  Nothing else getting in the way.
 

Yes, you're right and I have done this just to prove to myself that it is the index 
scan that 
is the bottleneck. I have some complex SQL that executes very quickly with Postgres, 
similar to MSSQL, but the index scans in most of those only touch a few rows for a few 
loops. It seems to be a problem when the index scan is scanning very many rows and 
for each of these it has to go to the table just to find out if the index it just 
looked at is 
still valid.

Gary.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] planner/optimizer question

2004-04-30 Thread Gary Doades
On 30 Apr 2004 at 9:37, Kevin Barnard wrote:

 
 I was always under the impression that MSSQL used leaf and row level locking and 
 therefore 
 was not a concurrent, in the same sense that postgres is, database. It would still 
 allow for 
 concurrent connections and such but updates will get blocked/ delayed. I might be 
 wrong.
 

Ultimately you may be right. I don't know enough about SQLServer 
internals to say either way. Anyway, most of our system is in selects for 
70% of the time. I could try and set up a test for this when I get a bit 
more time.

Unfortunately I suspect that this topic won't get taken much further. In 
order to test this it would mean modifying quite a bit of code. Whether 
putting additional info in the index header and not visiting the data row 
if all the required data is in the index would be beneficial would require 
quite a bit of work by someone who knows more than I do. I reckon that 
no-one has the time to do this at the moment.

Regards,
Gary.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Gary Doades
 
 I guess the real question is, why maintain index flags and not simply
 drop the index entry altogether?
 
 A more interesting case would be to have the backend process record
 index tuples that it would invalidate (if committed), then on commit
 send that list to a garbage collection process.
 
 It's still vacuum -- just the reaction time for it would be much
 quicker.
 
This was my original question.

I guess the problem is with MVCC. The row may have gone from your 
current view of the table but not from someone elses. I don't (yet) 
understand the way it works to say for sure, but I still think it is worth 
pursuing further for someone who does know the deep stuff. They seem 
to have concluded that it is not worth it however.

Cheers,
Gary.




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Gary Doades

 It's also entirely possible your indices are using inaccurate
 statistical information.  Have you ANALYZEd recently?
 

In this example the statistics don't matter. The plans used were the same for 
MSSQL and Postgres. I was trying to eliminate the difference in plans 
between the two, which obviously does make a difference, sometimes in 
MSSQL favour and sometimes the other way round. Both systems, having 
decided to do the same index scan, took noticably different times. The 
Postgres database was fully vacuumed and analysed anyway.

I agree about MSSQL recovery time. it sucks. This is why they are making a 
big point about the improved recovery time in yukon. Although the recovery 
time is important, I see this as an exception, whereas at the moment I am 
interested in the everyday.

Cheers,
Gary.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Gary Doades
On 29 Apr 2004 at 13:54, Josh Berkus wrote:

 Gary,
 
 
 It's also quite possble the MSSQL simply has more efficient index scanning 
 implementation that we do.They've certainly had incentive; their storage 
 system sucks big time for random lookups and they need those fast indexes.  
 (just try to build a 1GB adjacency list tree on SQL Server.   I dare ya).
 
 Certainly the fact that MSSQL is essentially a single-user database makes 
 things easier for them.They don't have to maintain multiple copies of the 
 index tuples in memory.I think that may be our main performance loss.
 

Possibly, but MSSQL certainly uses data from indexes and cuts out the 
subsequent (possibly random seek) data fetch. This is also why the 
Index Tuning Wizard often recommends multi column compound 
indexes in some cases. I've tried these recommendations on occasions 
and they certainly speed up the selects significantly. If anyhing the index 
scan on the new compound index must be slower then the original single 
column index and yet it still gets the data faster.

This indicates to me that it is not the scan (or IO) performance that is 
making the difference, but not having to go get the data row.

Cheers,
Gary.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] planner/optimizer question

2004-04-29 Thread Gary Doades
 
 Having picked out an index scan as being the highest time user I 
 concentrated on that in  this case and compared the same index scan on 
 MSSQL. At least MSSQL reported it as  an index scan on the same index 
 for the same number of rows. 
 

I should have also pointed out that MSSQL reported that same index scan as taking 65% 
of the overall query time.
It was just faster. The overall query took 103ms in MSSQL.

Gary.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] planner/optimizer question

2004-04-28 Thread Gary Doades
I know you will shoot me down, but...

Why is there an entry in the index for a row if the row is not valid? 
Wouldn't it be better for the index entry validity to track the row validity. 
If a particular data value for a query (join, where etc.) can be satisfied 
by the index entry itself this would be a big performance gain.

Cheers,
Gary.

On 28 Apr 2004 at 0:27, Tom Lane wrote:

 [EMAIL PROTECTED] writes:
  ... Wouldn't the most efficient plan be to scan the index regardless
  of crm_id because the only columns needed are in the index?
 
 No.  People coming from other databases often have the misconception
 that queries can be answered by looking only at an index.  That is never
 true in Postgres because row validity info is only stored in the table;
 so we must always visit the table entry to make sure the row is still
 valid/visible for the current query.
 
 Accordingly, columns added to the index that aren't constrained by the
 WHERE clause are not very useful ...
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] planner/optimizer question

2004-04-28 Thread Gary Doades
I can understand the performance loss on non-selects for keeping the 
index validity state tracking the row validity, but would that outweigh the 
performance gains on selects? Depends on your mix of selects to non 
selects I guess, but other database systems seem to imply that keeping 
the index on track is worth it overall.

Cheers,
Gary.

On 28 Apr 2004 at 15:04, Christopher Kings-Lynne wrote:

  Why is there an entry in the index for a row if the row is not valid? 
  Wouldn't it be better for the index entry validity to track the row validity. 
  If a particular data value for a query (join, where etc.) can be satisfied 
  by the index entry itself this would be a big performance gain.
 
 For SELECTs, yes - but for INSERT, UPDATE and DELETE it would be a big 
 performance loss.
 
 Chris
 



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-05 Thread Gary Doades
On 5 Apr 2004 at 8:36, Josh Berkus wrote:

 
 Point taken, though, SQL Server has done a better job in opitimizing for 
 dumb queries.   This is something that PostgreSQL needs to work on, as is 
 self-referential updates for large tables, which also tend to be really slow.   
 Mind you, in SQL Server 7 I used to be able to crash the server with a big 
 self-referential update, so this is a common database problem.
 

I agree about the dumb queries (I'm not mine are *that* dumb :) )

When you can write SQL that looks right, feels right, gives the right 
answers during testing and SQLServer runs them really fast, you stop 
there and tend not to tinker with the SQL further.

You *can* (I certainly do) achieve comparable performance with 
PostgreSQL, but you just have to work harder for it. Now that I have 
learned the characteristics of both servers I can write SQL that is pretty 
good on both. I suspect that there are people who evaluate PostgreSQL 
by executing their favorite SQLSever queries against it, see that it is 
slower and never bother to go further.

Cheers,
Gary.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-04 Thread Gary Doades
Possibly.

A lot of my queries show comparable performance, some a little slower 
and a few a little faster. There are a few, however, that really grind on 
PostgreSQL. I am leaning patterns from these to try and and target the 
most likely performance problems to come and hand tune these types 
of SQL.

I'm not complaining about PostgreSQL or saying that SQLServer is 
better, in most cases it is not. SQLServer seems to be more predictable 
and forgiving in performance which tends to make for lazy SQL 
programming. It also has implications when the SQL is dynamically 
created based on user input, there are more chances of PostgreSQL 
hitting a performance problem than SQLServer.

Overall I'm still very impressed with PostgreSQL. Given the $7000 per 
processor licence for SQLServer makes the case for PostgreSQL even 
stronger!

Cheers,
Gary.

On 3 Apr 2004 at 17:43, Aaron Werman wrote:

Almost any cross dbms migration shows a drop in performance. The engine
effectively trains developers and administrators in what works and what
doesn't. The initial migration thus compares a tuned to an untuned version.

/Aaron

- Original Message - 
From: Josh Berkus [EMAIL PROTECTED]
To: Gary Doades [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Saturday, April 03, 2004 1:59 PM
Subject: Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.


 Gary,

  There are no indexes on the columns involved in the update, they are
  not required for my usual select statements. This is an attempt to
  slightly denormalise the design to get the performance up comparable
  to SQL Server 2000. We hope to move some of our databases over to
  PostgreSQL later in the year and this is part of the ongoing testing.
  SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet)
  so I am hand optimising some of the more frequently used
  SQL and/or tweaking the database design slightly.

 Hmmm ... that hasn't been my general experience on complex queries.
However,
 it may be due to a difference in ANALYZE statistics.   I'd love to see you
 increase your default_stats_target, re-analyze, and see if PostgreSQL gets
 smarter.

 -- 
 -Josh Berkus
  Aglio Database Solutions
  San Francisco


 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


-- 
Incoming mail is certified Virus Free.
Checked by AVG Anti-Virus (http://www.grisoft.com).
Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-04 Thread Gary Doades
Unfortunately I don't understand the question!

My background is the primarily Win32. The last time I used a *nix OS 
was about 20 years ago apart from occasional dips into the linux OS 
over the past few years. If you can tell be how to find out what you want 
I will gladly give you the information.

Regards,
Gary.

On 3 Apr 2004 at 16:52, Cott Lang wrote:

 On Sat, 2004-04-03 at 03:50, Gary Doades wrote:
  On 2 Apr 2004 at 22:36, [EMAIL PROTECTED] wrote:
  
  OK, some more detail: 
  
  Before wiping 2.4 off my test box for the second time: 
 
 Perhaps I missed it, but which io scheduler are you using under 2.6?
 
 
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
 
 
 -- 
 Incoming mail is certified Virus Free.
 Checked by AVG Anti-Virus (http://www.grisoft.com).
 Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
 



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-04 Thread Gary Doades
Unfortunately I have to try and keep both SQLServer and PostgreSQL 
compatibilty. Our main web application is currently SQLServer, but we 
want to migrate customers who don't care what the DB server is over to 
PostgreSQL. Some of our larger customers demand SQLServer, you 
know how it is!

I don't want to maintain two sets of code or SQL, so I am trying to find 
common ground. The code is not a problem, but the SQL sometimes is.

Cheers,
Gary.


On 3 Apr 2004 at 17:43, Aaron Werman wrote:

 Almost any cross dbms migration shows a drop in performance. The engine
 effectively trains developers and administrators in what works and what
 doesn't. The initial migration thus compares a tuned to an untuned version.
 
 /Aaron
 
 - Original Message - 
 From: Josh Berkus [EMAIL PROTECTED]
 To: Gary Doades [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Sent: Saturday, April 03, 2004 1:59 PM
 Subject: Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
 
 
  Gary,
 
   There are no indexes on the columns involved in the update, they are
   not required for my usual select statements. This is an attempt to
   slightly denormalise the design to get the performance up comparable
   to SQL Server 2000. We hope to move some of our databases over to
   PostgreSQL later in the year and this is part of the ongoing testing.
   SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet)
   so I am hand optimising some of the more frequently used
   SQL and/or tweaking the database design slightly.
 
  Hmmm ... that hasn't been my general experience on complex queries.
 However,
  it may be due to a difference in ANALYZE statistics.   I'd love to see you
  increase your default_stats_target, re-analyze, and see if PostgreSQL gets
  smarter.
 
  -- 
  -Josh Berkus
   Aglio Database Solutions
   San Francisco
 
 
  ---(end of broadcast)---
  TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly
 
 
 
 -- 
 Incoming mail is certified Virus Free.
 Checked by AVG Anti-Virus (http://www.grisoft.com).
 Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
 



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-04 Thread Gary Doades
On 3 Apr 2004 at 21:23, Mike Nolan wrote:

  Almost any cross dbms migration shows a drop in performance. The engine
  effectively trains developers and administrators in what works and what
  doesn't. The initial migration thus compares a tuned to an untuned version.
 
 I think it is also possible that Microsoft has more programmers working
 on tuning issues for SQL Server than PostgreSQL has working on the 
 whole project.
 --
 Mike Nolan
 

Agreed. Also considering the high price of SQLServer it is in their 
interests to spend a lot of resources on tuning/performance to give it a 
commercial edge over it rivals and in silly benchmark scores.

Cheers,
Gary.
 

 -- 
 Incoming mail is certified Virus Free.
 Checked by AVG Anti-Virus (http://www.grisoft.com).
 Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
 



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-03 Thread Gary Doades
Actually it hasn't been my experience either. Most of my queries against 
the database, large and small are either a little quicker or no real 
difference. I have only really noticed big differences under stress when 
memory (RAM) is being squeezed. The main winner on 2.6 seems to be 
write performance and memory management.

Unfortunately I only have one test machine and I can't really keep 
switching between 2.4 and 2.6 to do the comparisons. I had written 
down 27 timings from a set of SQL of varying complexity using the 2.4 
kernel. Each SQL statement was executed 10 times and the average of 
the last 5 was used. I can only really compare those timings against the 
new installation on 2.6. I know that this is not ideal real world testing, 
but it is good enough for me at the moment. Unless anyone has 
contradictory indications then I will proceed with 2.6.

I did increase the default stats target from 10 to 50 and re-analysed. 
The explain numbers are slightly different, but the time to run was 
almost the same. Not surprising since the plan was the same.

QUERY PLAN 
Merge Join  (cost=0.00..192636.20 rows=2845920 width=92) 
  Merge Cond: (outer.reqt_id = inner.reqt_id) 
  -  Index Scan using order_reqt_pkey on order_reqt r  (cost=0.00..52662.40 
rows=2206291 width=6) 
  -  Index Scan using staff_book_idx2 on staff_booking  (cost=0.00..102529.28 
rows=2845920 width=90) 


On 3 Apr 2004 at 10:59, Josh Berkus wrote:

Gary,

 There are no indexes on the columns involved in the update, they are  
 not required for my usual select statements. This is an attempt to  
 slightly denormalise the design to get the performance up comparable  
 to SQL Server 2000. We hope to move some of our databases over to  
 PostgreSQL later in the year and this is part of the ongoing testing.  
 SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) 
 so I am hand optimising some of the more frequently used  
 SQL and/or tweaking the database design slightly. 

Hmmm ... that hasn't been my general experience on complex queries.   However, 
it may be due to a difference in ANALYZE statistics.   I'd love to see you 
increase your default_stats_target, re-analyze, and see if PostgreSQL gets 
smarter.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


-- 
Incoming mail is certified Virus Free.
Checked by AVG Anti-Virus (http://www.grisoft.com).
Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-03 Thread Gary Doades
Thanks,

I know about set showplan_text, but it is only the equivalent of explain, 
not explain analyze. The graphical plan gives full statistics, runtime, 
percentage cost, loop execution counts etc. which is much more useful. 
I don't know of a way of getting the graphical plan content in text form.

Cheers,
Gary.

On 3 Apr 2004 at 6:50, @g v t c wrote:

Use Set Show_Plan or something of the sort in Query Analyzer.  Then 
run your SQL.  This will change the graphical plan to a text plan 
similar to Postgresql or at least something close to readable.

Gary Doades wrote:

On 2 Apr 2004 at 22:36, [EMAIL PROTECTED] wrote:

OK, some more detail: 

Before wiping 2.4 off my test box for the second time: 

SQL Statement for update: 
update staff_booking set time_from = r.time_from from order_reqt r where r.reqt_id = 
staff_booking.reqt_id; 

Explain: (on 2.4) 
QUERY PLAN 
Merge Join  (cost=0.00..185731.30 rows=2845920 width=92) 
  Merge Cond: (outer.reqt_id = inner.reqt_id) 
  -  Index Scan using order_reqt_pkey on order_reqt r  (cost=0.00..53068.20 
rows=2206291 width=6) 
  -  Index Scan using staff_book_idx2 on staff_booking  (cost=0.00..99579.21 
rows=2845920 width=90) 

Total execution time: 18 hours 12 minutes 

vacuum full analyze: total time 3 hours 22 minutes 

Wait 2 hours for re-install 2.6, set params etc.  
restore database.  

Same SQL Statement 
Explain: (on 2.6) 
QUERY PLAN 
Merge Join  (cost=0.00..209740.24 rows=2845920 width=92) 
  Merge Cond: (outer.reqt_id = inner.reqt_id) 
  -  Index Scan using order_reqt_pkey on order_reqt r  (cost=0.00..50734.20 
rows=2206291 width=6) 
  -  Index Scan using staff_book_idx2 on staff_booking  (cost=0.00..117921.92 
rows=2845920 width=90) 

Total execution time: 2 hours 53 minutes 

vacuum full analyze: total time 1 hours 6 minutes 

Table definitions for the two tables involved: 
CREATE TABLE ORDER_REQT 
( 
   REQT_ID SERIAL, 
   ORDER_IDinteger NOT NULL, 
   DAYOFWEEK   smallint NOT NULL CHECK (DAYOFWEEK  
BETWEEN 0 AND 6), 
   TIME_FROM   smallint NOT NULL CHECK (TIME_FROM  
BETWEEN 0 AND 1439), 
   DURATIONsmallint NOT NULL CHECK (DURATION  
BETWEEN 0 AND 1439), 
   PRODUCT_ID  integer NOT NULL, 
   NUMBER_REQT smallint NOT NULL DEFAULT (1), 
   WROPTIONS   integer NOT NULL DEFAULT 0, 
   UID_REF integer NOT NULL, 
   DT_STAMPtimestamp NOT NULL DEFAULT  
current_timestamp, 
   Sentinel_Priority   integer NOT NULL DEFAULT 0, 
   PERIOD  smallint NOT NULL DEFAULT 1 CHECK  
(PERIOD BETWEEN -2 AND 4), 
   FREQUENCY   smallint NOT NULL DEFAULT 1, 
   PRIMARY KEY (REQT_ID) 
); 

CREATE TABLE STAFF_BOOKING 
( 
   BOOKING_ID  SERIAL, 
   REQT_ID integer NOT NULL, 
   ENTITY_TYPE smallint NOT NULL DEFAULT 3  
check(ENTITY_TYPE in(3,4)), 
   STAFF_IDinteger NOT NULL, 
   CONTRACT_ID integer NOT NULL, 
   TIME_FROM   smallint NOT NULL CHECK (TIME_FROM  
BETWEEN 0 AND 1439), 
   DURATIONsmallint NOT NULL CHECK (DURATION  
BETWEEN 0 AND 1439), 
   PERIOD  smallint NOT NULL DEFAULT 1 CHECK  
(PERIOD BETWEEN -2 AND 4), 
   FREQUENCY   smallint NOT NULL DEFAULT 1, 
   TRAVEL_TO   smallint NOT NULL DEFAULT 0, 
   UID_REF integer NOT NULL, 
   DT_STAMPtimestamp NOT NULL DEFAULT  
current_timestamp, 
   SELL_PRICE  numeric(10,4) NOT NULL DEFAULT 0, 
   COST_PRICE  numeric(10,4) NOT NULL DEFAULT 0, 
   MIN_SELL_PRICE  numeric(10,4) NOT NULL DEFAULT 0, 
   MIN_COST_PRICE  numeric(10,4) NOT NULL DEFAULT 0, 
   Sentinel_Priority   integer NOT NULL DEFAULT 0, 
   CHECK_INTERVAL  smallint NOT NULL DEFAULT 0, 
  STATUS   smallint NOT NULL DEFAULT 0, 
   WROPTIONS   integer NOT NULL DEFAULT 0, 
   PRIMARY KEY (BOOKING_ID) 
); 

Foreign keys: 

ALTER TABLE ORDER_REQT ADD  
FOREIGN KEY  
   ( 
   ORDER_ID 
   ) REFERENCES MAIN_ORDER ( 
   ORDER_ID 
   ) ON DELETE CASCADE; 

ALTER TABLE ORDER_REQT ADD  
FOREIGN KEY  
   ( 
   PRODUCT_ID 
   ) REFERENCES PRODUCT ( 
   PRODUCT_ID 
   ); 

ALTER TABLE STAFF_BOOKING ADD  
FOREIGN KEY  
   ( 
   CONTRACT_ID 
   ) REFERENCES STAFF_CONTRACT ( 
   CONTRACT_ID 
   ); 

ALTER TABLE STAFF_BOOKING ADD  
FOREIGN KEY  
   ( 
   STAFF_ID 
   ) REFERENCES STAFF ( 
   STAFF_ID 
   ); 


Indexes: 

CREATE INDEX FK_IDX_ORDER_REQT  
ON ORDER_REQT  
   ( 
   ORDER_ID

Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-03 Thread Gary Doades
Following on from Josh's response and my previous reply on SQLServer planning.

The main problem query is this one:

SELECT VS.*,VL.TEL1,SC.CONTRACT_ID,SC.CONTRACT_REF, SC.MAX_HOURS, 
SC.MIN_HOURS, 
 (SELECT COUNT(*) FROM TIMESHEET_DETAIL JOIN MAIN_ORDER ON 
(MAIN_ORDER.ORDER_ID = TIMESHEET_DETAIL.ORDER_ID AND 
MAIN_ORDER.CLIENT_ID = 6) WHERE TIMESHEET_DETAIL.CONTRACT_ID = 
SC.CONTRACT_ID) AS VISITS,
(SELECT (SUM(R.DURATION+1))/60.0 FROM ORDER_REQT R
 JOIN STAFF_BOOKING B ON (B.REQT_ID = R.REQT_ID)
 JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) WHERE 
B.CONTRACT_ID = SC.CONTRACT_ID 
 AND BP.BOOKING_DATE BETWEEN '2004-06-12' AND '2004-06-18') AS RHOURS 
FROM VSTAFF VS
JOIN STAFF_CONTRACT SC ON (SC.STAFF_ID = VS.STAFF_ID)
JOIN VLOCATION VL ON (VL.LOCATION_ID = VS.LOCATION_ID)
JOIN SEARCH_REQT_RESULT SR ON (SR.STAFF_ID = VS.STAFF_ID)
WHERE SR.SEARCH_ID = 1 AND SC.CONTRACT_ID IN
(SELECT C.CONTRACT_ID FROM STAFF_PRODUCT P,STAFF_CONTRACT C  
WHERE P.CONTRACT_ID=C.CONTRACT_ID AND C.STAFF_ID = VS.STAFF_ID AND 
P.PRODUCT_ID IN (SELECT PRODUCT_ID FROM SEARCH_ORDER_REQT WHERE 
SEARCH_ID = 1)  AND C.AVAIL_DATE_FROM = '2004-06-12' AND 
C.AVAIL_DATE_TO = '2004-06-18'  GROUP BY C.CONTRACT_ID
 HAVING (COUNT(C.CONTRACT_ID) = (SELECT COUNT(DISTINCT PRODUCT_ID) 
FROM SEARCH_ORDER_REQT WHERE SEARCH_ID = 1)))

The explain analyze is:
QUERY PLAN
Nested Loop  (cost=101.54..1572059.57 rows=135 width=152) (actual 
time=13749.100..1304586.501 rows=429 loops=1)
  InitPlan
-  Index Scan using fk_idx_wruserarea on wruserarea  (cost=3.26..6.52 rows=1 
width=4) (actual time=0.944..0.944 rows=1 loops=1)
  Index Cond: (area_id = 1)
  Filter: (uid = $4)
  InitPlan
-  Seq Scan on wruser  (cost=0.00..3.26 rows=1 width=4) (actual 
time=0.686..0.691 rows=1 loops=1)
  Filter: ((username)::name = current_user())
  -  Hash Join  (cost=95.02..3701.21 rows=215 width=138) (actual 
time=100.476..1337.392 rows=429 loops=1)
Hash Cond: (outer.staff_id = inner.staff_id)
Join Filter: (subplan)
-  Seq Scan on staff_contract sc  (cost=0.00..33.24 rows=1024 width=37) 
(actual 
time=0.114..245.366 rows=1024 loops=1)
-  Hash  (cost=93.95..93.95 rows=430 width=109) (actual time=38.563..38.563 
rows=0 loops=1)
  -  Hash Join  (cost=47.47..93.95 rows=430 width=109) (actual 
time=15.502..36.627 rows=429 loops=1)
Hash Cond: (outer.staff_id = inner.staff_id)
-  Seq Scan on staff  (cost=34.61..66.48 rows=1030 width=105) 
(actual 
time=9.655..15.264 rows=1030 loops=1)
  Filter: ((hashed subplan) OR $5)
  SubPlan
-  Seq Scan on staff_area  (cost=10.73..33.38 rows=493 
width=4) 
(actual time=8.452..8.452 rows=0 loops=1)
  Filter: ((hashed subplan) OR (area_id = 1))
  SubPlan
-  Seq Scan on wruserarea  (cost=3.26..10.72 
rows=5 width=4) 
(actual time=0.977..1.952 rows=1 loops=1)
  Filter: (uid = $1)
  InitPlan
-  Seq Scan on wruser  (cost=0.00..3.26 
rows=1 width=4) 
(actual time=0.921..0.926 rows=1 loops=1)
  Filter: ((username)::name = 
current_user())
-  Hash  (cost=11.79..11.79 rows=430 width=4) (actual 
time=5.705..5.705 
rows=0 loops=1)
  -  Index Scan using fk_idx_search_reqt_result on 
search_reqt_result 
sr  (cost=0.00..11.79 rows=430 width=4) (actual time=0.470..4.482 rows=429 loops=1)
Index Cond: (search_id = 1)
SubPlan
  -  HashAggregate  (cost=8.32..8.32 rows=1 width=4) (actual 
time=2.157..2.157 
rows=1 loops=429)
Filter: (count(contract_id) = $9)
InitPlan
  -  Aggregate  (cost=1.04..1.04 rows=1 width=4) (actual 
time=0.172..0.173 
rows=1 loops=1)
-  Seq Scan on search_order_reqt  (cost=0.00..1.04 rows=1 
width=4) 
(actual time=0.022..0.038 rows=1 loops=1)
  Filter: (search_id = 1)
-  Hash IN Join  (cost=1.04..7.27 rows=1 width=4) (actual 
time=2.064..2.117 
rows=1 loops=429)
  Hash Cond: (outer.product_id = inner.product_id)
  -  Nested Loop  (cost=0.00..6.19 rows=7 width=8) (actual 
time=1.112..2.081 rows=8 loops=429)
-  Index Scan using fk_idx_staff_contract_2 on 
staff_contract c  
(cost=0.00..3.03 rows=1 width=4) (actual time=0.206..0.245 rows=1 loops=429)
  Index Cond: (staff_id = $8)
  Filter: ((avail_date_from = '2004-06-12'::date) AND 
(avail_date_to 
= '2004-06-18'::date))
-  Index Scan using 

[PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-01 Thread Gary Doades
As part of my ongoing evaluation of PostgreSQL I have been doing a little stress 
testing. 
I though I would share an interesting result here..

Machine spec:
500 MHz PIII
256MB RAM
old-ish IDE HD (5400RPM)
Linux 2.4.22 kernel (Madrake 9.2)

I have PostgreSQL 7.4.1 installed and have managed to load up a 1.4 GB database 
from MS SQLServer. Vaccum analyzed it.

As a test in PosgreSQL I issued a statement to update a single column of a table 
containing 2.8 million rows with the values of a column in a table with similar 
rowcount. 
Using the above spec I had to stop the server after 17 hours. The poor thing was 
thrashing the hard disk and doing more swapping than useful work.

Having obtained a copy of Mandrake 10.0 with the 2.6 kernal I though I would give it a 
go. Same hardware. Same setup. Same database loaded up. Same postgresql.conf file 
to make sure all the settings were the same.  Vaccum analyzed it.

same update statement COMPLETED in 2 hours 50 minutes. I'm impressed.

I could see from vmstat that the system was achieving much greater IO thoughput than 
the 2.4 kernel. Although the system was still swapping there seems to be a completely 
different memory management pattern that suits PostgreSQL very well.

Just to see that this wasn't a coincidence I am repeating the test. It is now into the 
14th 
hour using the old 2.4 kernel. I'm going to give up.

Has anyone else done any comparative testing with the 2.6 kernel?

Cheers,
Gary.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-01 Thread Gary Doades
The post was not intended to be content-rich, just my initial feedback 
after only just switching to 2.6. Since I had largely given up on this 
particular line of attack using 2.4 I didn't think to do a detailed analysis 
at this time. I was also hoping that others would add to the discussion. 

As this could become important I will be doing more analysis, but due to 
the nature of the issue and trying to keep as many factors constant as 
possible, this may take some time.

Cheers,
Gary.

On 2 Apr 2004 at 1:32, Tom Lane wrote:

 Gary Doades [EMAIL PROTECTED] writes:
  As a test in PosgreSQL I issued a statement to update a single column
  of a table containing 2.8 million rows with the values of a column in
  a table with similar rowcount.  Using the above spec I had to stop the
  server after 17 hours. The poor thing was thrashing the hard disk and
  doing more swapping than useful work.
 
 This statement is pretty much content-free, since you did not show us
 the table schemas, the query, or the EXPLAIN output for the query.
 (I'll forgive you the lack of EXPLAIN ANALYZE, but you could easily
 have provided all the other hard facts.)  There's really no way to tell
 where the bottleneck is.  Maybe it's a kernel-level issue, but I would
 not bet on that without more evidence.  I'd definitely not bet on it
 without direct confirmation that the same query plan was used in both
 setups.
 
   regards, tom lane
 
 
 -- 
 Incoming mail is certified Virus Free.
 Checked by AVG Anti-Virus (http://www.grisoft.com).
 Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
 



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match