date:20040405

Re: [PERFORM] possible improvement between G4 and G5

2004-04-05 Thread Tom Lane

Qing Zhao <[EMAIL PROTECTED]> writes:
> We have got a G5 64-bit processor to replace an old G4 32-bit
> processor.  Given everything else equal, should we see a big
> improvement on PG's performance?

Nope.  Database performance typically depends on disk performance first,
and RAM size second.  A 64-bit processor might help by allowing you to
install more RAM, but you didn't say that you had.

> The other question I have is that, when I tried different size for
> shared_buffer ( i used 10,000, 1,000, 528, 256) and Max
> connections=32, it gives me error when I tried to start PG using
> pg_ctl start as postgres. It kept saying this is bigger than the
> system Shared Memory.

Out-of-the-box, Mac OS X has a very low SHMMAX limit.  See the PG admin
docs or the mail list archives about how to increase it.  You should do
this --- most people find that you want to set shared_buffers to 1000 or
1 or so for best performance.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] atrocious update performance

2004-04-05 Thread Stephan Szabo

On Mon, 5 Apr 2004, Rosser Schwarz wrote:

> while you weren't looking, Kevin Barnard wrote:
>
> > Have you added indexes for the custid column for tables
> > account.acct accunt.orgacct and note?
>
> They were indexed in the original case, yes.  There was no
> need to index them in today's test case, as that was done
> purely in attempt to rule in or out foreign key validation
> as the cause of the performance hit.  No foreign keys that
> might be validated, no need to index the foreign key columns.
>
> > I haven't followed the entire thread but it you have
> > cascading FK on those tables without an index on the
> > column that could cause your delay.
>
> The issue is that the foreign keys are being validated at
> all, when the column being referenced by those foreign keys
> (account.cust.custid) is never touched.
>
> Regardless of whether or not the referencing columns are
> indexed, validating them at all--in this specific case--is
> broken.  The column they refer to is never touched; they
> should remain utterly ignorant of whatever happens to other
> columns in the same row.

It shouldn't be checking the other table if the values of the key column
hadn't changed. The ri_KeysEqual check should be causing it to return just
before actually doing the check on the other table (it still does a few
things before then but nothing that should be particularly expensive). In
some simple tests on my 7.4.2 machine, this appears to work for me on pk
cascade updates. It would be interesting to know if it's actually doing
any checks for you, you might be able to poke around the triggers
(backend/utils/adt/ri_triggers.c).

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[PERFORM] possible improvement between G4 and G5

2004-04-05 Thread Qing Zhao

Hi,

We have got a G5  64-bit processor to replace an old G4 32-bit 
processor.  Given everything
else equal, should we see a big improvement on PG's performance?

The other question I have is that, when I tried different size for 
shared_buffer ( i used 10,000,
1,000, 528, 256) and Max connections=32, it gives me error when I tried 
to start PG using
pg_ctl start as postgres. It kept saying this is bigger than the system 
Shared Memory.  So finally
I started PG using SystemStarter start PostgreSQL and it seems starting 
OK. Any idea?

Thanks a lot!

Qing Zhao

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] atrocious update performance

2004-04-05 Thread scott.marlowe

On Mon, 5 Apr 2004, Kevin Barnard wrote:

> On 5 Apr 2004 at 12:05, Rosser Schwarz wrote:
> 
> > Just this morning, however, I created a copy of the target table (all
> > 4.7M rows), with absolutely no foreign keys referring to it, and ran
> > the update against the copy.  That update took 2300 seconds.  The
> > join columns were indexed in both cases.
> 
> Have you added indexes for the custid column for tables account.acct accunt.orgacct 
> and note?
> 
> I haven't followed the entire thread but it you have cascading FK on those tables 
> without an index on the column that could cause your delay.

also make sure the fk/pk types match, or the index likely won't get used 
anyway.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] atrocious update performance

2004-04-05 Thread Rosser Schwarz

while you weren't looking, Kevin Barnard wrote:

> Have you added indexes for the custid column for tables 
> account.acct accunt.orgacct and note?

They were indexed in the original case, yes.  There was no
need to index them in today's test case, as that was done
purely in attempt to rule in or out foreign key validation
as the cause of the performance hit.  No foreign keys that
might be validated, no need to index the foreign key columns.

> I haven't followed the entire thread but it you have 
> cascading FK on those tables without an index on the
> column that could cause your delay.

The issue is that the foreign keys are being validated at
all, when the column being referenced by those foreign keys
(account.cust.custid) is never touched.

Regardless of whether or not the referencing columns are
indexed, validating them at all--in this specific case--is
broken.  The column they refer to is never touched; they
should remain utterly ignorant of whatever happens to other
columns in the same row.

/rls

--
Rosser Schwarz
Total Card, Inc.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] performance comparission postgresql/ms-sql server

2004-04-05 Thread Matthew T. O'Connor

Heiko Kehlenbrink wrote:

i use suse 8.1
 postgresql 7.2 compiled from the rpms for using postgis, but that is
 

Try v7.4, there are many performance improvements.  It may not make up 
all the differences but it should help.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] atrocious update performance

2004-04-05 Thread Kevin Barnard

On 5 Apr 2004 at 12:05, Rosser Schwarz wrote:

> Just this morning, however, I created a copy of the target table (all
> 4.7M rows), with absolutely no foreign keys referring to it, and ran
> the update against the copy.  That update took 2300 seconds.  The
> join columns were indexed in both cases.

Have you added indexes for the custid column for tables account.acct accunt.orgacct 
and note?

I haven't followed the entire thread but it you have cascading FK on those tables 
without an index on the column that could cause your delay.

Kevin Barnard
SpeedFC

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] single index on more than two coulumns a bad thing?

2004-04-05 Thread Aaron Werman

You're absolutely correct that the general rule is to lead a composite index
with the highest cardinality index columns for fastest selectivity. Indices
and all physical design are based on usage. In this case of unique indices
supporting primary keys in a hierarchy, it depends. For selection of small
sets of arbitrary rows, your arrangement is best. For hierarchy based
queries,  such as "for grandparent of foo, and parent of bar, give average
age of sons" - the hierarchy based index is often more efficient.

Surrogate keys have a role, and can improve performance, but also carry an
enormous penalty of intentionally obfuscating logical keys and data
semantics, and almost always lead to data errors not being caught because
they obscure irrational relationships. I hate them, but use them frequently
in high transaction rate operational systems where there is much functional
validation outside the dbms (and the apps behave therefore like object
databases and surrogate keys are network database pointers) and in data
warehousing (where downstream data cannot be corrected anyway).

/Aaron

- Original Message - 
From: "Leeuw van der, Tim" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, April 04, 2004 5:06 PM
Subject: Re: [PERFORM] single index on more than two coulumns a bad thing?


Hi Aaron,

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of
> Aaron Werman
> Sent: vrijdag 2 april 2004 13:57
>
>
> another thing that I have all over the place is a hierarchy:
> index on grandfather_table(grandfather)
> index on father_table(grandfather, father)
> index on son_table(grandfather, father, son)
>

It depends on your data-distribution, but I find that in almost all cases
it's beneficial to have your indexes the other way round in such cases:

index on grandfather_table(grandfather)
index on father_table(father, grandfather)
index on son_table(son, father, grandfather)

That usually gives a less common, more selective value at the start of the
index, making the initial selection in the index smaller.

And AFAIK I don't have to rewrite my queries for that; the planner doesn't
care about the order of expressions in the query that are on the same level.

That said, I tend to use 'surrogate keys'; keys generated from sequences or
auto-number columns for my tables. It makes the tables less readable, but
the indexes remain smaller.


Greetings,

--Tim



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-05 Thread Gary Doades

On 5 Apr 2004 at 8:36, Josh Berkus wrote:

> 
> Point taken, though, SQL Server has done a better job in opitimizing for 
> "dumb" queries.   This is something that PostgreSQL needs to work on, as is 
> self-referential updates for large tables, which also tend to be really slow.   
> Mind you, in SQL Server 7 I used to be able to crash the server with a big 
> self-referential update, so this is a common database problem.
> 

I agree about the "dumb" queries (I'm not mine are *that* dumb :) )

When you can write SQL that looks right, feels right, gives the right 
answers during testing and SQLServer runs them really fast, you stop 
there and tend not to tinker with the SQL further.

You *can* (I certainly do) achieve comparable performance with 
PostgreSQL, but you just have to work harder for it. Now that I have 
learned the characteristics of both servers I can write SQL that is pretty 
good on both. I suspect that there are people who evaluate PostgreSQL 
by executing their favorite SQLSever queries against it, see that it is 
slower and never bother to go further.

Cheers,
Gary.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-05 Thread Magnus Naeslund(t)

Gary Doades wrote:

Has anyone else done any comparative testing with the 2.6 kernel?

I know for a fact that certain stuff is recognized differently between 
2.2, 2.4 and 2.6 kernels.
For example i have one box that i installed debian stable on that used a 
2.2 kernel which automatically tuned on DMA on the harddrive, didn't do 
it on a 2.4 kernel, but on 2.6 one it saw it as DMA able.
Such things can dramatically affect performance, so make sure to compare 
what capabilities the kernel thinks your hardware has between the 
kernels first...

But i'll grant that the 2.6 kernel is a great deal faster on some of our 
test servers.

Regards
Magnus
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

[PERFORM] Estimated rows way off

2004-04-05 Thread Michael Guerin

*statistics  target = 100
*INFO:  index "timeseries_tsid" now contains *16,677,521* row versions 
in 145605 pages
DETAIL:  76109 index pages have been deleted, 2 are currently reusable.
CPU 12.00s/2.83u sec elapsed 171.26 sec.
INFO:  "timeseries": found 0 removable, 16677521 nonremovable row 
versions in 1876702 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were *18,894,051* unused item pointers.
0 pages are entirely empty.
CPU 138.74s/28.96u sec elapsed 1079.43 sec.
INFO:  vacuuming "pg_toast.pg_toast_1286079786"
INFO:  index "pg_toast_1286079786_index" now contains 4846282 row 
versions in 29319 pages
DETAIL:  10590 index pages have been deleted, 10590 are currently reusable.
CPU 2.23s/0.55u sec elapsed 28.34 sec.
INFO:  "pg_toast_1286079786": found 0 removable, 4846282 nonremovable 
row versions in 1379686 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 2824978 unused item pointers.
0 pages are entirely empty.
CPU 112.92s/19.53u sec elapsed 731.99 sec.
INFO:  analyzing "public.timeseries"
INFO:  "timeseries": 1876702 pages, *30,000* rows sampled, *41,762,188* 
estimated total rows

setting the default statistics target higher makes the estimate worse:
*statistics target = 500*
INFO:  index "timeseries_tsid" now contains *16,953,429 *row versions in 
145605 pages
INFO:  "timeseries": 1891940 pages, *150,000* rows sampled, *64,803,483* 
estimated total rows

*statistics target = 1000 *
INFO:  index "timeseries_tsid" now contains *17,216,139* row versions in 
145605 pages
INFO:  "timeseries": 1937484 pages, *300,000* rows sampled, *68,544,295* 
estimated total rows

I'm trying to understand why the estimated row count is so off.  I'm 
assuming this is b/c we do very large deletes and we're leaving around a 
large number of almost empty pages.  Is this the reason?

Let me know if you need more info.

Thanks
Michael






INFO:  index "timeseries_tsid" now contains *16677521* row versions 
in 145605 pages
DETAIL:  76109 index pages have been deleted, 2 are currently 
reusable.
CPU 12.00s/2.83u sec elapsed 171.26 sec.
INFO:  "timeseries": found 0 removable, 16677521 nonremovable row 
versions in 1876702 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 18894051 unused item pointers.
0 pages are entirely empty.
CPU 138.74s/28.96u sec elapsed 1079.43 sec.
INFO:  vacuuming "pg_toast.pg_toast_1286079786"
INFO:  index "pg_toast_1286079786_index" now contains 4846282 row 
versions in 29319 pages
DETAIL:  10590 index pages have been deleted, 10590 are currently 
reusable.
CPU 2.23s/0.55u sec elapsed 28.34 sec.
INFO:  "pg_toast_1286079786": found 0 removable, 4846282 nonremovable 
row versions in 1379686 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 2824978 unused item pointers.
0 pages are entirely empty.
CPU 112.92s/19.53u sec elapsed 731.99 sec.
INFO:  analyzing "public.timeseries"
INFO:  "timeseries": 1876702 pages, *30,000* rows sampled, 
*41,762,188* estimated total rows

 

setting the default statistics target higher made the estimate worse:  
(changed from 100 to 500)
*
statistics target = 500*
INFO:  index "timeseries_tsid" now contains *16,953,429 *row versions 
in 145605 pages
INFO:  "timeseries": 1891940 pages, *150,000* rows sampled, 
*64,803,483* estimated total rows

*statistics target = 1000
*INFO:  index "timeseries_tsid" now contains *17,216,139* row versions 
in 145605 pages
INFO:  "timeseries": 1937484 pages,* 300,000* rows sampled, 
*68,544,295* estimated total rows





This probably has something to do with the large deletes we do.  I'm 
looking around to get some more info on statistics collection.

-mike



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] Delete performance on delete from table with inherited tables

2004-04-05 Thread Chris Kratz

Thanks Stephan and Tom for your responses.  We have been busy, so I haven't 
had time to do any further research on this till yesterday.  I found that the 
large number of triggers on the parent or master table were foreign key 
triggers for each table back to the child tables (update and delete on 
master, insert on child).  The triggers have existed through several versions 
of postgres and as far as we can tell were automatically created using the 
references keyword at inception.

Yesterday I dropped all the current triggers on parent and children and ran a 
script that did an alter table add foreign key constraint to each of the 67 
child tables with update cascade delete cascade.  After this, the delete from 
the parent where no records existed in the child tables was far more 
acceptable.  Instead of taking hours to do the delete, the process ran for 
about 5 minutes on my workstation.  Removing all constraints entirely reduces 
this time to a couple of seconds.  I am currently evaluating if the foreign 
key constraints are worth the performance penalty in this particular case.

To finish up, it appears that the foreign key implementation has changed since 
when these first tables were created in our database.  Dropping the existing 
triggers and re-adding the constraints on each table significantly improved 
performance for us.  I do not know enough of the internals to know why this 
happened.  But our experience seems to prove that the newer implementation of 
foreign keys is more efficient then previous versions.  YMMV

One other item that was brought up was whether the child tables have the fk 
column indexed, and the answer was yes.  Each had a standard btree index on 
the foreign key.   Explain showed nothing as all the time was being spent in 
the triggers.  Time spent in triggers is not shown in the pg 7.3.4 version of 
explain (nor would I necessarily expect it to).

Thanks for your time, expertise and responses.

-Chris

On Tuesday 09 March 2004 7:18 pm, Stephan Szabo wrote:
> On Wed, 3 Mar 2004, Chris Kratz wrote:
> > Which certainly points to the triggers being the culprit.  In reading the
> > documentation, it seems like the "delete from only..." statement should
> > ignore the constraint triggers.  But it seems quite obvious from the
>
> Delete from only merely means that children of the table being deleted
> will not have their rows checked against any where conditions and removed
> for that reason.  It does not affect constraint triggers at all.
>
> Given I'm guessing it's going to be running about 7000 * 67 queries to
> check the validity of the delete for 7000 rows each having 67 foreign
> keys, I'm not sure there's much to do other than hack around the issue
> right now.
>
> If you're a superuser, you could temporarily hack reltriggers on the
> table's pg_class row to 0, run the delete and then set it back to the
> correct number. I'm guessing from your message that there's never any
> chance of a concurrent transaction putting in a matching row in a way that
> something is marked as deletable when it isn't?

-- 
Chris Kratz
Systems Analyst/Programmer
VistaShare LLC
www.vistashare.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] SETOF performance

2004-04-05 Thread Joe Conway

Jeff wrote:
I think it was on this list - someone posted a  message about SETOF 
being slower.  Tom replied saying it was because it needed to create an 
on-disk tuplestore.

I was just looking for some clarification - a SETOF function will always 
write the reslting tuples to disk (Not buffering in say a sort_mem sized 
buffer)?
I think at least part of what you're seeing is normal function call 
overhead. As far as tuplestores writing to disk, here's what the source 
says:

In src/backend/utils/sort/tuplestore.c
8<---
 * maxKBytes: how much data to store in memory (any data beyond this
 * amount is paged to disk).  When in doubt, use work_mem.
 */
Tuplestorestate *
tuplestore_begin_heap(bool randomAccess, bool interXact, int maxKBytes)
8<---
In src/backend/executor/execQual.c:ExecMakeTableFunctionResult():
8<---
tupstore = tuplestore_begin_heap(true, false, work_mem);
8<---
So up to work_mem (sort_mem in 7.4 and earlier) should be stored in memory.

Joe

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] atrocious update performance

2004-04-05 Thread Rosser Schwarz

After deinstalling and scrubbing PostgreSQL from my server and doing
a clean build using a vanilla 7.4.2 tree, I'm rather more confident
that foreign key validation is at cause in my performance problems.

I recreated my schemas and ran the original update, with foreign
keys referring to the identity column of the target table.  The
update took roughly two days, as I'd predicted based on my analysis
of the previous installation.  (I can't say how long with certainty,
beyond that it finished some time between when I left work one night
and came in the next morning, the second day after starting the
query.)  I'm not sure what was wrong with the previous install, such
that the update took several days; two-ish days is long enough.

Just this morning, however, I created a copy of the target table (all
4.7M rows), with absolutely no foreign keys referring to it, and ran
the update against the copy.  That update took 2300 seconds.  The
join columns were indexed in both cases.

I'm in the process of migrating the machine to run kernel 2.6.4,
following the thread started by Gary, though I suspect that the
kernel revision is moot with respect to whether or not foreign keys
are being incorrectly validated.  I can keep the 2.4 kernel and
modules around to run using the current versions for testing
purposes, though any such work would necessarily be off-hours.

Please advise of anything I can do to help narrow down the specific
cause of the issue; I know just enough C to be mildly dangerous.

/rls

--
Rosser Schwarz
Total Card, Inc.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-05 Thread markw

On  4 Apr, Cott Lang wrote:
> On Sun, 2004-04-04 at 01:56, Gary Doades wrote:
>> Unfortunately I don't understand the question!
>> 
>> My background is the primarily Win32. The last time I used a *nix OS 
>> was about 20 years ago apart from occasional dips into the linux OS 
>> over the past few years. If you can tell be how to find out what you want 
>> I will gladly give you the information.
> 
> There are two available io schedulers in 2.6 (new feature), deadline and
> anticipatory. It should show be listed in the boot messages:
> 
> dmesg | grep scheduler
> 
> I've seen people arguing for each of the two schedulers, saying one is
> better than the other for databases. I'm curious which one you're
> using.  :)

Our database tests (TPC fair use implementations) show that the deadline
scheduler has an edge on the anticipatory scheduler.  Depending on the
current state of the AS scheduler, it can be within a few percent to 10%
or so.

I have some data with one of our tests here:
http://developer.osdl.org/markw/fs/dbt2_project_results.html

Mark

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] performance comparission postgresql/ms-sql server

2004-04-05 Thread Tom Lane

"Heiko Kehlenbrink" <[EMAIL PROTECTED]> writes:
> i use suse 8.1
>   postgresql 7.2 compiled from the rpms for using postgis, but that is
> anothe story...

7.4 might be a little quicker; but in any case you should be doing this
sort of comparison using the current release, no?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] SETOF performance

2004-04-05 Thread Jeff

I think it was on this list - someone posted a  message about SETOF 
being slower.  Tom replied saying it was because it needed to create an 
on-disk tuplestore.

I was just looking for some clarification - a SETOF function will 
always write the reslting tuples to disk (Not buffering in say a 
sort_mem sized buffer)?

I think if that is the case I may need to go back and change some stuff 
around.
I have a procedure that I broke out a bit to make life easier.

Basically it goes

for v_row in
select blah from function_that_gets_data_from_some_cache()
rowcount := rowcount + 1;
return next v_row;
end for;
if rowcount = 0 then
[same thing, but we call some_function_that_creates_data_for_cache]
end if;
Doing it this way means I avoid having to deal with it in the client 
and I also avoid having a giant stored procedure. (I like short & sweet 
things)

What I've found for timings is this:

select * from function_that_gets_data_from_some_cache() runs around 1.8 
ms
but select * from the_top_level_function() runs around 4.2ms
(Yes, I know 4.2 ms is fast, but that is not the point).

could this overhead be related to the SETOF tuplestores?

Might it be better to use refcursor or something or bite the bullet and 
live with a giant procedure?

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-05 Thread Josh Berkus

Mike,

> I think it is also possible that Microsoft has more programmers working
> on tuning issues for SQL Server than PostgreSQL has working on the
> whole project.

Ah, but quantity != quality.Or they wouldn't be trolling our mailing lists 
trying to hire PostgreSQL programmers for the SQL Server project (really!).   
And we had nearly 200 contributors between 7.3 and 7.4 ... a respectable 
development staff for even a large corporation.

Point taken, though, SQL Server has done a better job in opitimizing for 
"dumb" queries.   This is something that PostgreSQL needs to work on, as is 
self-referential updates for large tables, which also tend to be really slow.   
Mind you, in SQL Server 7 I used to be able to crash the server with a big 
self-referential update, so this is a common database problem.

Unfortunately, these days only Tom and Neil seem to be seriously working on 
the query planner (beg pardon in advance if I've missed someone) so I think 
the real answer is that we need another person interested in this kind of 
optimization before it's going to get much better.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] performance comparission postgresql/ms-sql server

2004-04-05 Thread Shridhar Daithankar

Heiko Kehlenbrink wrote:

hi list,

i want to convince people to use postgresql instead of ms-sql server, so i
set up a kind of comparission insert data / select data from postgresql /
ms-sql server
the table i use was pretty basic,

id   bigserial
dist  float8
x float8
y float8
z float8
i filled the table with a function which filled x,y,z with incremental
increasing values (1,2,3,4,5,6...) and computing from that the dist value
for every tupel (sqrt((x*x)+(y*y)+(z*z))).
this works fine for both dbms

postgresql needs 13:37 min for 10.000.000 tupel,
ms-sql needs 1:01:27 h for 10.000.000 tupel.
so far so good.

i attached an index on the dist row and started to query the dbs with
scripts which select a serial row of 100.000,200.000,500.000 tupels based
on the dist row.
i randomizly compute the start and the end distance and made a "select
avg(dist) from table where dist > startdist and dist < enddist"
Some basics to check quickly.

1. vacuum analyze the table before you start selecting.
2. for slow running queries, check explain analyze output and find out who takes 
maximum time.
3. Check for typecasting. You need to typecast the query correctly e.g.

select avg(dist) from table where dist >startdist::float8 and dist

This might still end up with sequential scan depending upon the plan. but if 
index scan is picked up, it might be plenty fast..

Post explain analyze for the queries if things don't improve.

 HTH

 Shridhar

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[PERFORM] performance comparission postgresql/ms-sql server

2004-04-05 Thread Heiko Kehlenbrink

hi list,

i want to convince people to use postgresql instead of ms-sql server, so i
set up a kind of comparission insert data / select data from postgresql /
ms-sql server

the table i use was pretty basic,

id   bigserial
dist  float8
x float8
y float8
z float8

i filled the table with a function which filled x,y,z with incremental
increasing values (1,2,3,4,5,6...) and computing from that the dist value
for every tupel (sqrt((x*x)+(y*y)+(z*z))).

this works fine for both dbms

postgresql needs 13:37 min for 10.000.000 tupel,
ms-sql needs 1:01:27 h for 10.000.000 tupel.

so far so good.

i attached an index on the dist row and started to query the dbs with
scripts which select a serial row of 100.000,200.000,500.000 tupels based
on the dist row.
i randomizly compute the start and the end distance and made a "select
avg(dist) from table where dist > startdist and dist < enddist"

Did the same with a table with 50.000.000 tupel in ms-sql and postgres.

the outcome so far:

100.000 from 50.000.000:

postgres: 0.88 sec
ms-sql: 0.38 sec

200.000 from 50.000.000:

postgres: 1.57 sec
ms-sql: 0.54 sec

500.000 from 50.000.000:

postgres: 3.66 sec
ms-sql: 1.18 sec

i try a lot of changes to the postgresql.conf regarding  "Tuning
PostgreSQL for performance"
by
Shridhar Daithankar, Josh Berkus

which did not make a big diffrence to the answering times from postgresql.

i'm pretty fine with the insert time...

do you have any hints like compiler-flags and so on to get the answering
time from postgresql equal to ms-sql?

(btw both dbms were running on exactly the same hardware)

i use suse 8.1
  postgresql 7.2 compiled from the rpms for using postgis, but that is
anothe story...
  1.5 gig ram
  1.8 mhz intel cpu


every help welcome

best regards heiko




---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] performance comparission postgresql/ms-sql server

2004-04-05 Thread Josh Berkus

Heiko,

> 100.000 from 50.000.000:
>
> postgres: 0.88 sec
> ms-sql: 0.38 sec
>
> 200.000 from 50.000.000:
>
> postgres: 1.57 sec
> ms-sql: 0.54 sec
>
> 500.000 from 50.000.000:
>
> postgres: 3.66 sec
> ms-sql: 1.18 sec

Questions:

1. Is this the time to return *all rows* or just the first row?   Given the 
different way that PostgreSQL fetches rows to the client from MSSQL, it makes 
a difference.

2. What are your sort-mem and shared-mem settings?

3. Have you tried clustering the table?

4. Have you done a comparison of selecting random or scattered, instead of 
serial rows?   MSSQL has a tendency to physically store rows in "order" which 
gives it a certain advantage in this kind of query.


-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] possible improvement between G4 and G5

Re: [PERFORM] atrocious update performance

[PERFORM] possible improvement between G4 and G5

Re: [PERFORM] atrocious update performance

Re: [PERFORM] atrocious update performance

Re: [PERFORM] performance comparission postgresql/ms-sql server

Re: [PERFORM] atrocious update performance

Re: [PERFORM] single index on more than two coulumns a bad thing?

Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

[PERFORM] Estimated rows way off

Re: [PERFORM] Delete performance on delete from table with inherited tables

Re: [PERFORM] SETOF performance

Re: [PERFORM] atrocious update performance

Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

Re: [PERFORM] performance comparission postgresql/ms-sql server

[PERFORM] SETOF performance

Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

Re: [PERFORM] performance comparission postgresql/ms-sql server

[PERFORM] performance comparission postgresql/ms-sql server

Re: [PERFORM] performance comparission postgresql/ms-sql server

21 matches

Site Navigation

Mail list logo

Footer information