[PERFORM] index auto changes after copying data ?

2005-11-28 Thread [EMAIL PROTECTED]
I know in mysql, index will auto change after copying data
Of course, index will change after inserting a line in postgresql, but what 
about copying data?


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread David Lang

On Mon, 28 Nov 2005, Brendan Duddridge wrote:

Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it 
possible to upgrade from Postgres 8.1 to Bizgres?


MPP is the Greenplum propriatary extention to postgres that spreads the 
data over multiple machines, (raid, but with entire machines not just 
drives, complete with data replication within the cluster to survive a 
machine failing)


for some types of queries they can definantly scale lineraly with the 
number of machines (other queries are far more difficult and the overhead 
of coordinating the machines shows more. this is one of the key things 
that the new version they recently announced the beta for is supposed to 
be drasticly improving)


early in the year when I first looked at them their prices were 
exorbadent, but Luke says I'm wildly mistake on their current prices so 
call them for details


it uses the same interfaces as postgres so it should be a drop in 
replacement to replace a single server with a cluster.


it's facinating technology to read about.

I seem to remember reading that one of the other postgres companies is 
also producing a clustered version of postgres, but I don't remember who 
and know nothing about them.


David Lang

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread David Lang

On Mon, 28 Nov 2005, Brendan Duddridge wrote:


Hi David,

Thanks for your reply. So how is that different than something like Slony2 or 
pgcluster with multi-master replication? Is it similar technology? We're 
currently looking for a good clustering solution that will work on our Apple 
Xserves and Xserve RAIDs.


MPP doesn't just split up the data, it splits up the processing as well, 
so if you have a 5 machine cluster, each machine holds 1/5 of your data 
(plus a backup for one of the other machines) and when you do a query MPP 
slices and dices the query to send a subset of the query to each machine, 
it then gets the responses from all the machines and combines them


if you ahve to do a full table scan for example, wach machine would only 
have to go through 20% of the data


a Slony of pgcluster setup has each machine with a full copy of all the 
data, only one machine can work on a given query at a time, and if you 
have to do a full table scan one machine needs to read 100% of the data.


in many ways this is the holy grail of databases. almost all other areas 
of computing can now be scaled by throwing more machines at the problem in 
a cluster, with each machine just working on it's piece of the problem, 
but databases have had serious trouble doing the same and so have been 
ruled by the 'big monster machine'. Oracle has been selling Oracle Rac for 
a few years, and reports from people who have used it range drasticly 
(from it works great, to it's a total disaster), in part depending on the 
types of queries that have been made.


Greenplum thinks that they have licked the problems for the more general 
case (and that commodity networks are now fast enough to match disk speeds 
in processing the data) if they are right then when they hit full release 
with the new version they should be cracking a lot of the 
price/performance records on the big database benchmarks (TPC and 
similar), and if their pricing is reasonable, they may be breaking them by 
an order of magnatude or more (it's not unusual for the top machines to 
spend more then $1,000,000 on just their disk arrays for those 
systems, MPP could conceivably put togeather a cluster of $5K machines 
that runs rings around them (and probably will for at least some of the 
subtests, the big question is if they can sweep the board and take the top 
spots outright)


they have more details (and marketing stuff) on their site at 
http://www.greenplum.com/prod_deepgreen_cluster.html


don't get me wrong, I am very impressed with their stuff, but (haveing 
ranted a little here on the list about them) I think MPP and it's 
performace is a bit off topic for the postgres performance list (at least 
until the postgres project itself starts implementing similar features :-)


David Lang


Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Nov 27, 2005, at 8:09 PM, David Lang wrote:


On Mon, 28 Nov 2005, Brendan Duddridge wrote:

Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it 
possible to upgrade from Postgres 8.1 to Bizgres?


MPP is the Greenplum propriatary extention to postgres that spreads the 
data over multiple machines, (raid, but with entire machines not just 
drives, complete with data replication within the cluster to survive a 
machine failing)


for some types of queries they can definantly scale lineraly with the 
number of machines (other queries are far more difficult and the overhead 
of coordinating the machines shows more. this is one of the key things that 
the new version they recently announced the beta for is supposed to be 
drasticly improving)


early in the year when I first looked at them their prices were exorbadent, 
but Luke says I'm wildly mistake on their current prices so call them for 
details


it uses the same interfaces as postgres so it should be a drop in 
replacement to replace a single server with a cluster.


it's facinating technology to read about.

I seem to remember reading that one of the other postgres companies is also 
producing a clustered version of postgres, but I don't remember who and 
know nothing about them.


David Lang






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread David Boreham

Brendan Duddridge wrote:

Thanks for your reply. So how is that different than something like  
Slony2 or pgcluster with multi-master replication? Is it similar  
technology? We're currently looking for a good clustering solution  
that will work on our Apple Xserves and Xserve RAIDs.


I think you need to be more specific about what you're trying to do.
'clustering' encompasses so many things that it means almost nothing by 
itself.


slony provides facilities for replicating data. Its primary purpose is
to improve reliability. MPP distributes both data and queries. Its
primary purpose is to improve performance for a subset of all query types.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread Brendan Duddridge

Hi David,

Thanks for your reply. So how is that different than something like  
Slony2 or pgcluster with multi-master replication? Is it similar  
technology? We're currently looking for a good clustering solution  
that will work on our Apple Xserves and Xserve RAIDs.


Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Nov 27, 2005, at 8:09 PM, David Lang wrote:


On Mon, 28 Nov 2005, Brendan Duddridge wrote:

Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is  
it possible to upgrade from Postgres 8.1 to Bizgres?


MPP is the Greenplum propriatary extention to postgres that spreads  
the data over multiple machines, (raid, but with entire machines  
not just drives, complete with data replication within the cluster  
to survive a machine failing)


for some types of queries they can definantly scale lineraly with  
the number of machines (other queries are far more difficult and  
the overhead of coordinating the machines shows more. this is one  
of the key things that the new version they recently announced the  
beta for is supposed to be drasticly improving)


early in the year when I first looked at them their prices were  
exorbadent, but Luke says I'm wildly mistake on their current  
prices so call them for details


it uses the same interfaces as postgres so it should be a drop in  
replacement to replace a single server with a cluster.


it's facinating technology to read about.

I seem to remember reading that one of the other postgres companies  
is also producing a clustered version of postgres, but I don't  
remember who and know nothing about them.


David Lang





smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Please help with this explain analyse...

2005-11-28 Thread David Gagnon

I restored my db but haven't run the analyse... That was the problem.

Thanks
/David

"Merge Left Join  (cost=2273.54..2290.19 rows=228 width=816) (actual 
time=2098.257..2444.472 rows=8335 loops=1)"

"  Merge Cond: (("outer".rrnum)::text = "inner"."?column8?")"
"  ->  Merge Join  (cost=2131.25..2141.31 rows=228 width=744) (actual 
time=2037.953..2251.289 rows=8335 loops=1)"

"Merge Cond: ("outer"."?column31?" = "inner"."?column77?")"
"->  Sort  (cost=1975.03..1975.60 rows=228 width=235) (actual 
time=1798.556..1811.828 rows=8335 loops=1)"

"  Sort Key: (cr.crrrnum)::text"
"  ->  Hash Join  (cost=1451.41..1966.10 rows=228 width=235) 
(actual time=267.751..515.396 rows=8335 loops=1)"

"Hash Cond: ("outer".crnum = "inner".cscrnum)"
"->  Seq Scan on cr  (cost=0.00..489.77 rows=4529 
width=101) (actual time=0.077..97.615 rows=8335 loops=1)"
"  Filter: (((crypnum)::text = 'M'::text) AND 
((crdate + '00:00:00'::interval) <= '2005-01-28 00:00:00'::timestamp 
without time zone))"
"->  Hash  (cost=1449.70..1449.70 rows=684 
width=134) (actual time=267.568..267.568 rows=13587 loops=1)"
"  ->  Nested Loop  (cost=20.59..1449.70 
rows=684 width=134) (actual time=33.099..178.524 rows=13587 loops=1)"
"->  Seq Scan on gl  (cost=0.00..5.21 
rows=2 width=91) (actual time=0.021..0.357 rows=1 loops=1)"

"  Filter: (glsoctrl = 1)"
"->  Bitmap Heap Scan on cs  
(cost=20.59..684.42 rows=3026 width=43) (actual time=33.047..115.151 
rows=13587 loops=1)"
"  Recheck Cond: ((cs.csglnum)::text 
= ("outer".glnum)::text)"
"  Filter: ('M'::text = 
(csypnum)::text)"
"  ->  Bitmap Index Scan on 
cs_gl_fk  (cost=0.00..20.59 rows=3026 width=0) (actual 
time=32.475..32.475 rows=13587 loops=1)"
"Index Cond: 
((cs.csglnum)::text = ("outer".glnum)::text)"
"->  Sort  (cost=156.22..159.65 rows=1372 width=509) (actual 
time=239.315..254.024 rows=8974 loops=1)"

"  Sort Key: (rr.rrnum)::text"
"  ->  Seq Scan on rr  (cost=0.00..84.72 rows=1372 
width=509) (actual time=0.055..33.564 rows=1372 loops=1)"
"  ->  Sort  (cost=142.29..144.55 rows=903 width=72) (actual 
time=60.246..74.813 rows=8932 loops=1)"

"Sort Key: (yr.yrref)::text"
"->  Bitmap Heap Scan on yr  (cost=16.42..97.96 rows=903 
width=72) (actual time=8.513..13.587 rows=1154 loops=1)"
"  Recheck Cond: (((yryotype)::text = 'Client'::text) AND 
((yryonum)::text = 'Comptabilite.Recevable.Regroupement'::text))"
"  ->  Bitmap Index Scan on yr_idx1  (cost=0.00..16.42 
rows=903 width=0) (actual time=8.471..8.471 rows=1154 loops=1)"
"Index Cond: (((yryotype)::text = 'Client'::text) 
AND ((yryonum)::text = 'Comptabilite.Recevable.Regroupement'::text))"

"Total runtime: 2466.197 ms"


Bricklen Anderson <[EMAIL PROTECTED]> writes:
 


Your loops are what is causing the time spent.
eg. "actual time=0.127..17.379 rows=1154 loops=8335)" ==
8335*(17.379-0.127)/1000=>143 secs (if my math is correct).
   



As for where the problem is, I think it's the horrid misestimate of the
number of matching rows in cs_pk:

 

"->  Index Scan using cs_pk on cs  (cost=0.00..12.83 
rows=2 width=144) (actual time=0.087..444.999 rows=13587 loops=1)"
"  Index Cond: (('M'::text = (cs.csypnum)::text) 
AND ((cs.csglnum)::text = ("outer".glnum)::text))"
 



Has that table been ANALYZEd recently?  How about the gl table?

regards, tom lane

 




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Please help with this explain analyse...

2005-11-28 Thread Tom Lane
Bricklen Anderson <[EMAIL PROTECTED]> writes:
> Your loops are what is causing the time spent.
> eg. "actual time=0.127..17.379 rows=1154 loops=8335)" ==
> 8335*(17.379-0.127)/1000=>143 secs (if my math is correct).

As for where the problem is, I think it's the horrid misestimate of the
number of matching rows in cs_pk:

>> "->  Index Scan using cs_pk on cs  (cost=0.00..12.83 
>> rows=2 width=144) (actual time=0.087..444.999 rows=13587 loops=1)"
>> "  Index Cond: (('M'::text = (cs.csypnum)::text) 
>> AND ((cs.csglnum)::text = ("outer".glnum)::text))"

Has that table been ANALYZEd recently?  How about the gl table?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Please help with this explain analyse...

2005-11-28 Thread Bricklen Anderson
David Gagnon wrote:

> "  ->  Index Scan using cr_pk on cr  (cost=0.00..6.02 rows=1
> width=828) (actual time=0.073..0.077 rows=1 loops=13587)"
> "Index Cond: (((cr.crypnum)::text = 'M'::text) AND
> (cr.crnum = "outer".cscrnum))"
> "Filter: ((crdate + '00:00:00'::interval) <=
> '2005-01-28 00:00:00'::timestamp without time zone)"
> "->  Index Scan using rr_pk on rr  (cost=0.00..5.99 rows=1
> width=5637) (actual time=0.062..0.069 rows=1 loops=8335)"
> "  Index Cond: (("outer".crrrnum)::text = (rr.rrnum)::text)"
> "  ->  Index Scan using yr_idx1 on yr  (cost=0.00..5.99 rows=1
> width=926) (actual time=0.127..17.379 rows=1154 loops=8335)"

Your loops are what is causing the time spent.
eg. "actual time=0.127..17.379 rows=1154 loops=8335)" ==
8335*(17.379-0.127)/1000=>143 secs (if my math is correct).



-- 
___

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
___

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread Brendan Duddridge
Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it  
possible to upgrade from Postgres 8.1 to Bizgres?


Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Nov 28, 2005, at 3:05 PM, Luke Lonergan wrote:


Mark,

On 11/28/05 1:45 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote:


8.0 : 32 s
8.1 : 25 s


A 22% reduction.

select count(1) on 12,900MB = 1617125 pages fully cached:

MPP based on 8.0 : 6.06s
MPP based on 8.1 : 4.45s

A 26% reduction.

I'll take it!

I am looking to back-port Tom's pre-8.2 changes and test again, maybe
tonight.

- Luke



---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings





smime.p7s
Description: S/MIME cryptographic signature


[PERFORM] Please help with this explain analyse...

2005-11-28 Thread David Gagnon

Hi all,

 I don't understand why this request take so long.  Maybe I read the 
analyse correctly but It seem that the first line(Nested Loop Left Join  
...) take all the time.  But I don't understand where the performance 
problem is ???  All the time is passed in the first line ...


Thanks for your help!

/David


explain analyse   SELECT *

   FROM CR INNER JOIN CS ON CR.CRNUM = CS.CSCRNUM AND 
CR.CRYPNUM = CS.CSYPNUM
   INNER JOIN GL ON CS.CSGLNUM = GL.GLNUM AND 
GL.GLSOCTRL = 1

   INNER JOIN RR ON CR.CRRRNUM = RR.RRNUM
   LEFT OUTER JOIN YR ON YR.YRYOTYPE = 'Client' AND 
YR.YRYONUM = 'Comptabilite.Recevable.Regroupement'  AND YR.YRREF = RR.RRNUM

   WHERE CRYPNUM = 'M'
AND CRDATE  + INTERVAL '0 days' <= '2005-01-28'


"Nested Loop Left Join  (cost=0.00..42.12 rows=1 width=8143) (actual 
time=15.254..200198.524 rows=8335 loops=1)"

"  Join Filter: (("inner".yrref)::text = ("outer".rrnum)::text)"
"  ->  Nested Loop  (cost=0.00..36.12 rows=1 width=7217) (actual 
time=0.441..2719.821 rows=8335 loops=1)"
"->  Nested Loop  (cost=0.00..30.12 rows=1 width=1580) (actual 
time=0.242..1837.413 rows=8335 loops=1)"
"  ->  Nested Loop  (cost=0.00..18.07 rows=2 width=752) 
(actual time=0.145..548.607 rows=13587 loops=1)"
"->  Seq Scan on gl  (cost=0.00..5.21 rows=1 
width=608) (actual time=0.036..0.617 rows=1 loops=1)"

"  Filter: (glsoctrl = 1)"
"->  Index Scan using cs_pk on cs  (cost=0.00..12.83 
rows=2 width=144) (actual time=0.087..444.999 rows=13587 loops=1)"
"  Index Cond: (('M'::text = (cs.csypnum)::text) 
AND ((cs.csglnum)::text = ("outer".glnum)::text))"
"  ->  Index Scan using cr_pk on cr  (cost=0.00..6.02 rows=1 
width=828) (actual time=0.073..0.077 rows=1 loops=13587)"
"Index Cond: (((cr.crypnum)::text = 'M'::text) AND 
(cr.crnum = "outer".cscrnum))"
"Filter: ((crdate + '00:00:00'::interval) <= 
'2005-01-28 00:00:00'::timestamp without time zone)"
"->  Index Scan using rr_pk on rr  (cost=0.00..5.99 rows=1 
width=5637) (actual time=0.062..0.069 rows=1 loops=8335)"

"  Index Cond: (("outer".crrrnum)::text = (rr.rrnum)::text)"
"  ->  Index Scan using yr_idx1 on yr  (cost=0.00..5.99 rows=1 
width=926) (actual time=0.127..17.379 rows=1154 loops=8335)"
"Index Cond: (((yryotype)::text = 'Client'::text) AND 
((yryonum)::text = 'Comptabilite.Recevable.Regroupement'::text))"

"Total runtime: 200235.732 ms"


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread Luke Lonergan
Mark,

On 11/28/05 1:45 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote:

>>> 8.0 : 32 s
>>> 8.1 : 25 s

A 22% reduction.

select count(1) on 12,900MB = 1617125 pages fully cached:

MPP based on 8.0 : 6.06s
MPP based on 8.1 : 4.45s

A 26% reduction.

I'll take it!

I am looking to back-port Tom's pre-8.2 changes and test again, maybe
tonight.

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread Mark Kirkwood

Merlin Moncure wrote:

It certainly makes quite a difference as I measure it:

doing select(1) from a 181000 page table (completely uncached) on my


PIII:


8.0 : 32 s
8.1 : 25 s

Note that the 'fastcount()' function takes 21 s in both cases - so all
the improvement seems to be from the count overhead reduction.



Are you running windows?  There is a big performance improvement in
count(*) on pg 8.0->8.1 on win32 that is not relevant to this debate...



No - FreeBSD 6.0 on a dual PIII 1 Ghz. The slow cpu means that the 8.1 
improvements are very noticeable!


A point of interest - applying Niels palloc - avoiding changes to 
NodeAgg.c and int8.c in 8.0 changes those results to:


8.0 + palloc avoiding patch : 27 s

(I am guessing the remaining 2 s could be shaved off if I backported 
8.1's virtual tuples - however that looked like a lot of work)


Cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread Merlin Moncure
> 
> It certainly makes quite a difference as I measure it:
> 
> doing select(1) from a 181000 page table (completely uncached) on my
PIII:
> 
> 8.0 : 32 s
> 8.1 : 25 s
> 
> Note that the 'fastcount()' function takes 21 s in both cases - so all
> the improvement seems to be from the count overhead reduction.

Are you running windows?  There is a big performance improvement in
count(*) on pg 8.0->8.1 on win32 that is not relevant to this debate...

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread Luke Lonergan
The MPP test I ran was with the release version 2.0 of MPP which is based on
Postgres 8.0, the upcoming 2.1 release is based on 8.1, and 8.1 is far
faster at seq scan + agg.  12,937MB were counted in 4.5 seconds, or 2890MB/s
from I/O cache.  That's 722MB/s per host, and 360MB/s per Postgres instance,
up from 267 previously with 8.0.3.

I'm going to apply Tom's pre-8.2 seq scan locking optimization and see how
much better we can get!

- Luke

 ==
 Bizgres MPP CVS tip (2.1 pre), 8 data segments, 1 per CPU
 ==

llonergan=# \timing
Timing is on.
llonergan=# explain select count(1) from lineitem;
QUERY PLAN
--
 Aggregate  (cost=0.01..0.01 rows=1 width=0)
   ->  Gather Motion  (cost=0.01..0.01 rows=1 width=0)
 ->  Aggregate  (cost=0.01..0.01 rows=1 width=0)
   ->  Seq Scan on lineitem  (cost=0.00..0.00 rows=1 width=0)
(4 rows)

Time: 1.464 ms
llonergan=# select count(1) from lineitem;
  count   
--
 59986052
(1 row)

Time: 4478.563 ms
llonergan=# select count(1) from lineitem;
  count   
--
 59986052
(1 row)

Time: 4550.917 ms
llonergan=# select count(1) from lineitem;
  count   
--
 59986052
(1 row)

Time: 4482.261 ms


On 11/24/05 9:16 AM, "Luke Lonergan" <[EMAIL PROTECTED]> wrote:

> The same 12.9GB distributed across 4 machines using Bizgres MPP fits into
> I/O cache.  The interesting result is that the query "select count(1)" is
> limited in speed to 280 MB/s per CPU when run on the lineitem table.  So
> when I run it spread over 4 machines, one CPU per machine I get this:
> 
> ==
> Bizgres MPP, 4 data segments, 1 per 2 CPUs
> ==
> llonergan=# explain select count(1) from lineitem;
>   QUERY PLAN
> 
> --
>  Aggregate  (cost=582452.00..582452.00 rows=1 width=0)
>->  Gather Motion  (cost=582452.00..582452.00 rows=1 width=0)
>  ->  Aggregate  (cost=582452.00..582452.00 rows=1 width=0)
>->  Seq Scan on lineitem  (cost=0.00..544945.00 rows=15002800
> width=0)
> (4 rows)
> 
> llonergan=# \timing
> Timing is on.
> llonergan=# select count(1) from lineitem;
>   count  
> --
>  59986052
> (1 row)
> 
> Time: 12191.435 ms
> llonergan=# select count(1) from lineitem;
>   count  
> --
>  59986052
> (1 row)
> 
> Time: 11986.109 ms
> llonergan=# select count(1) from lineitem;
>   count  
> --
>  59986052
> (1 row)
> 
> Time: 11448.941 ms
> ==
> 
> That's 12,937 MB in 11.45 seconds, or 1,130 MB/s.  When you divide out the
> number of Postgres instances (4), that's 283MB/s per Postgres instance.
> 
> To verify that this has nothing to do with MPP, I ran it in a special
> internal mode on one instance and got the same result.
> 
> So - we should be able to double this rate by running one segment per CPU,
> or two per host:
> 
> ==
> Bizgres MPP, 8 data segments, 1 per CPU
> ==
> llonergan=# select count(1) from lineitem;
>   count  
> --
>  59986052
> (1 row)
> 
> Time: 6484.594 ms
> llonergan=# select count(1) from lineitem;
>   count  
> --
>  59986052
> (1 row)
> 
> Time: 6156.729 ms
> llonergan=# select count(1) from lineitem;
>   count  
> --
>  59986052
> (1 row)
> 
> Time: 6063.416 ms
> ==
> That's 12,937 MB in 11.45 seconds, or 2,134 MB/s.  When you divide out the
> number of Postgres instances (8), that's 267MB/s per Postgres instance.
> 
> So, if you want to "select count(1)", using more CPUs is a good idea!  For
> most complex queries, having lots of CPUs + MPP is a good combo.
> 
> Here is an example of a sorting plan - this should probably be done with a
> hash aggregation, but using 8 CPUs makes it go 8x faster:
> 
> ==
> Bizgres MPP, 8 data segments, 1 per CPU
> ==
> llonergan=# \timing
> Timing is on.
> llonergan=# explain select l_orderkey from lineitem order by l_shipdate,
> l_extendedprice limit 10;
>QUERY PLAN
> 
> -
>  Limit  (cost=0.01..0.02 rows=1 width=24)
>->  Gather Motion  (cost=0.01..0.02 rows=1 width=24)
>  Merge Key: l_shipdate, l_extendedprice
>  ->  Limit  (cost=0.01..0.02 rows=1 width=24)
>->  Sort  (cost=0.01..0.02 rows=1 width=24)
>  Sort Key: l_shipdate, l_extendedp

Re: [PERFORM] Newbie question: ultra fast count(*)

2005-11-28 Thread Merlin Moncure
> I have been reading all this technical talk about costs and such that
> I don't (_yet_) understand.
> 
> Now I'm scared... what's the fastest way to do an equivalent of
> count(*) on a table to know how many items it has?

Make sure to analyze the database frequently and check pg_class for
reltuples field.  This gives 0 time approximations of # row in table at
the time of the last analyze.

Many other approaches...check archives.  Also your requirements are
probably not as high as you think they are ;)

Merlin


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings