Re: [PERFORM] Kernel Resources and max_connections

2005-05-03 Thread Mark Kirkwood
Chris Hebrard wrote:
kern.ipc.shmmax and kern.ipc.shmmin will not stay to what I set them to.
What am I doing wrong or not doing at all?
These need to go in /etc/sysctl.conf. You might need to set shmall as well.
(This not-very-clear distinction between what is sysctl'abe and what is 
a kernel tunable is a bit of a downer).

cheers
Mark

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Kernel Resources and max_connections

2005-05-03 Thread Mark Kirkwood
Chris Hebrard wrote:
I set the values in etc/sysctl.conf:
# $FreeBSD: src/etc/sysctl.conf,v 1.1.2.3 2002/04/15 00:44:13 dougb Exp $
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#
# Added by IMP 2005-05-04
net.inet.tcp.rfc1323=1
kern.ipc.somaxconn=1024
kern.ipc.maxsockbuf=8388608
net.inet.tcp.sendspace=3217968
net.inet.tcp.recvspace=3217968
kern.ipc.semmns="272"
kern.ipc.semmni="256"
kern.ipc.shmmax="66099200"
kern.ipc.shmmin="256"
After a restart both shmmax and shmmin are now 0 and postgres failed to 
start.


Hmmm - puzzling. One point to check, did you take them out of 
/boot/loader.conf ?

Assuming so, maybe don't quote 'em (see below).
Finally you need to to set shmall, otherwise it will over(under)ride the 
shmmax setting. So try:

net.inet.tcp.rfc1323=1
kern.ipc.somaxconn=1024
kern.ipc.maxsockbuf=8388608
net.inet.tcp.sendspace=3217968
net.inet.tcp.recvspace=3217968
kern.ipc.semmns=272
kern.ipc.semmni=256
kern.ipc.shmmax=66099200
kern.ipc.shmmin=256
kern.ipc.shmall=32768



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] tuning

2005-05-30 Thread Mark Kirkwood

list wrote:

hi-

i would like to see if someone could recommend something
to make my query run faster.


Values in postgresql.conf:
shared_buffers = 1000
sort_mem is commented out
effective_cache_size is commented out
random_page_cost is commented out



I would increase shared_buffers (say 5000 - 1), and also 
effective_cache_size (say around 2 - 5 - but work out how much 
memory this box has free or cached and adjust accordingly).


From your explain output, it looks like sorting is not too much of a 
problem - so you can leave it unchanged (for this query anyway).



Here is the query in question:
select * from productvendorview where (productlistid=3 or 
productlistid=5 or productlistid=4) and (lower(item) like '9229%' or 
lower(descrip) like 'toner%') order by vendorname,item limit 100;




You might want to break this into 2 queries and union them, so you can 
(potentially) use the indexes on productlistid,lower(item) and 
productlistid, lower(descrip) separately.




This query scans 412,457 records.

Here is the EXPLAIN ANALYZE for the query:

 Limit  (cost=45718.83..45719.08 rows=100 width=108) (actual 
time=39093.636..39093.708 rows=100 loops=1)
   ->  Sort  (cost=45718.83..45727.48 rows=3458 width=108) (actual 
time=39093.629..39093.655 rows=100 loops=1)

 Sort Key: v.vendorname, p.item
 ->  Hash Join  (cost=22.50..45515.57 rows=3458 width=108) 
(actual time=95.490..39062.927 rows=2440 loops=1)

   Hash Cond: ("outer".vendorid = "inner".id)
   ->  Seq Scan on test p  (cost=0.00..45432.57 rows=3457 
width=62) (actual time=89.066..39041.654 rows=2444 loops=1)
 Filter: (((productlistid = 3) OR (productlistid = 
5) OR (productlistid = 4)) AND
  ((lower((item)::text) ~~ '9229%'::text) OR 
(lower((descrip)::text) ~~ 'toner%'::text)))
   ->  Hash  (cost=20.00..20.00 rows=1000 width=54) (actual 
time=6.289..6.289 rows=0 loops=1)
 ->  Seq Scan on vendor v  (cost=0.00..20.00 
rows=1000 width=54) (actual time=0.060..3.653 rows=2797 loops=1)

 Total runtime: 39094.713 ms
(10 rows)



I guess the relation 'test' is a copy of product (?)

Cheers

Mark



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] postgresql-8.0.1 performance tuning

2005-05-31 Thread Mark Kirkwood

Cosimo Streppone wrote:

# Config 

/etc/sysctl.conf:
kernel.shmall = 786432000
kernel.shmmax = 786432000



I think you have a problem here.
kernel.shmmax should *not* be set to an amount of RAM, but
to maximum number of shared memory pages, which on a typical linux system
is 4kb. Google around:

  



This is somewhat confusing :

kernel.shmmax is in bytes (max single segment size)
kernel.shmall is in (4k) pages (max system wide allocated segment pages)

cheers

Mark



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] postgresql-8.0.1 performance tuning

2005-06-01 Thread Mark Kirkwood

Cosimo Streppone wrote:

Mark Kirkwood ha scritto:


Cosimo Streppone wrote:


# Config 


/etc/sysctl.conf:
kernel.shmall = 786432000
kernel.shmmax = 786432000



I think you have a problem here.
kernel.shmmax should *not* be set to an amount of RAM, but



Sorry, I thought "shmall" but written "shmmax".
Thanks Mark!



Hehe - happens to me all the time!

On the shmall front - altho there is *probably* no real performance 
impact setting it to the same as shmmax (i.e. allowing 4096 allocations 
of size shmmax!), it is overkill. In addition it does allow for a DOS by 
a program that allocates thousands of segments (or somehow starts 
thousands of Pg servers on different ports...)!


For a dedicated Pg server I would size shmall using a calculation along 
the lines of:


shmall = (no. of postgresql servers) * (shmmax/4096)


If there are other daemons on the box that need to use shared memory, 
then add their likely requirements to shmall too!


cheers

Mark

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: SHMMAX / SHMALL Was (Re: [PERFORM] postgresql-8.0.1 performance

2005-06-02 Thread Mark Kirkwood

Paul McGarry wrote:


Based on the powerpostgresql.com Performance Checklist [1]  and
Annotated Postgresql.conf [2] I understand that:
-I should have less than 1/3 of my total memory as shared_buffers
-For my server 15000 is a fairly reasonable starting point for
shared_buffers which is ~120MB
-I have 100 max_connections.

So I was going to set SHMMAX to 134217728 (ie 128 Meg)

What should SHMALL be?

The current system values are
[EMAIL PROTECTED]:~/data$ cat /proc/sys/kernel/shmmax
33554432
[EMAIL PROTECTED]:~/data$ cat /proc/sys/kernel/shmall
2097152

ie SHMALL seems to be 1/16 of SHMMAX



No - shmall is in 4k pages _ so this amounts to 8G! This is fine - 
unless you wish to decrease it in order to prevent too many shared 
memory applications running.


BTW - the docs have been amended for 8.1 to suggest shmmax=134217728 and 
shmall=2097152 (was going to point you at them - but I cannot find them 
on the Postgresql site anymore...).


There seems to be some longstanding confusion in the Linux community 
about the units for shmall (some incorrect documentation from Oracle on 
the issue does not help I am sure) - to the point where I downloaded 
kernel source to check (reproducing here):



linux-2.6.11.1/include/linux/shm.h:13->

#define SHMMAX 0x200 /* max shared seg size (bytes) */
#define SHMMIN 1 /* min shared seg size (bytes) */
#define SHMMNI 4096  /* max num of segs system wide */
#define SHMALL (SHMMAX/PAGE_SIZE*(SHMMNI/16)) /* max shm system wide 
(pages) */

#define SHMSEG SHMMNI


Hope that helps

Best wishes

Mark

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: SHMMAX / SHMALL Was (Re: [PERFORM] postgresql-8.0.1 performance

2005-06-03 Thread Mark Kirkwood

Martin Fandel wrote:
Aah ok :) 


I've set my values now as follow (2GB RAM):

SHMMAX=`cat /proc/meminfo | grep MemTotal | cut -d: -f 2 | awk '{print
$1*1024/3}'`
echo kernel.shmmax=${SHMMAX} >> /etc/sysctl.conf
SHMALL=`expr ${SHMALL} / 4096 \* \( 4096 / 16 \)`
echo kernel.shmall=${SHMALL} >> /etc/sysctl.conf

sysctl.conf:
kernel.shmmax=708329472
kernel.shmall=44270592

postgresql.conf:
max_connections=500
shared_buffers=4 # ~312MB, min. 1000, max ~ 83000



Hmmm - shmall set to 168G... err why? Apologies for nit picking a little 
- but shmall seems unreasonably high. I can't see much reason for 
setting it bigger than (physical RAM in bytes)/4096 myself. So in your 
case this is 2*(1024*1024*1024)/4096 = 524288


Cheers

Mark


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Filesystem

2005-06-03 Thread Mark Kirkwood

Martin Fandel wrote:

Hi @ all,

i have only a little question. Which filesystem is preferred for 
postgresql? I'm plan to use xfs (before i used reiserfs). The reason
is the xfs_freeze Tool to make filesystem-snapshots. 


Is the performance better than reiserfs, is it reliable?



I used postgresql with xfs on mandrake 9.0/9.1 a while ago - 
reliability was great, performance seemed better than ext3. I didn't 
compare with reiserfs - the only time I have ever lost data from a Linux 
box has been when I used reiserfs, hence I am not a fan :-(


best wishes

Mark

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Updates on large tables are extremely slow

2005-06-12 Thread Mark Kirkwood

Yves Vindevogel wrote:


I'm trying to update a table that has about 600.000 records.
The update query is very simple : update mytable set pagesdesc = - pages ;

The query takes about half an hour to an hour to execute. I have tried a 
lot of things.




Half an hour seem a bit long - I would expect less than 5 minutes on 
reasonable hardware.


You may have dead tuple bloat - can you post the output of 'ANALYZE 
VERBOSE mytable' ?


Cheers

Mark

---(end of broadcast)---
TIP 6: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Updates on large tables are extremely slow

2005-06-13 Thread Mark Kirkwood

Apologies - I should have said output of 'VACUUM VERBOSE mytable'.

(been using 8.1, which displays dead tuple info in ANALYZE...).

Mark

Yves Vindevogel wrote:

rvponp=# analyze verbose tblPrintjobs ;
INFO: analyzing "public.tblprintjobs"
INFO: "tblprintjobs": 19076 pages, 3000 rows sampled, 588209 estimated 
total rows

ANALYZE


On 13 Jun 2005, at 04:43, Mark Kirkwood wrote:

Yves Vindevogel wrote:

I'm trying to update a table that has about 600.000 records.
The update query is very simple : update mytable set pagesdesc =
- pages ;
The query takes about half an hour to an hour to execute. I have
tried a lot of things.


Half an hour seem a bit long - I would expect less than 5 minutes on
reasonable hardware.

You may have dead tuple bloat - can you post the output of 'ANALYZE
VERBOSE mytable' ?


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] QRY seems not using indexes

2005-08-08 Thread Mark Kirkwood

Qingqing Zhou wrote:

<[EMAIL PROTECTED]> writes



so, if I do a qry like "EXPLAIN ANALYZE select * from pridecdr where
idsede=8977758488" it tooks a lot of time before i get back any result:

Index Scan using prd_id_sede on pridecdr  (cost=0.00..699079.90
rows=181850 width=138) (actual time=51.241..483068.255 rows=150511
loops=1)
  Index Cond: (idsede = 8977758488::bigint)
Total runtime: 483355.325 ms




The query plan looks ok. Try to do EXPLAIN ANALYZE twice and see if there is
any difference. This could reduce the IO time to read your index/data since
you got enough RAM.

Also, if you haven't done VACUUM FULL for a long time, do so and compare the
difference.



Could also be libpq buffering all 15 rows before showing any.

It might be worthwhile using a CURSOR and doing 1 FETCH. If that is 
quick, then buffering is probably the issue. BTW - do you really want 
all the rows?


Cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Limit + group + join

2005-08-25 Thread Mark Kirkwood

Tobias,
Interesting example:

The 'desc' seems to be the guy triggering the sort, e.g:

explain select c.id from c join b on c_id=c.id  group by c.id order by 
c.id limit 5;
   QUERY PLAN 


-
 Limit  (cost=0.00..0.28 rows=5 width=4)
   ->  Group  (cost=0.00..4476.00 rows=8 width=4)
 ->  Merge Join  (cost=0.00..4276.00 rows=8 width=4)
   Merge Cond: ("outer".id = "inner".c_id)
   ->  Index Scan using c_pkey on c  (cost=0.00..1518.00 
rows=8 width=4)
   ->  Index Scan using b_on_c on b  (cost=0.00..1558.00 
rows=8 width=4)

(6 rows)

Whereas with it back in again:

explain select c.id from c join b on c_id=c.id  group by c.id order by 
c.id desc limit 5;
  QUERY PLAN 


--
 Limit  (cost=10741.08..10741.11 rows=5 width=4)
   ->  Group  (cost=10741.08..11141.08 rows=8 width=4)
 ->  Sort  (cost=10741.08..10941.08 rows=8 width=4)
   Sort Key: c.id
   ->  Hash Join  (cost=1393.00..4226.00 rows=8 width=4)
 Hash Cond: ("outer".c_id = "inner".id)
 ->  Seq Scan on b  (cost=0.00..1233.00 rows=8 
width=4)

 ->  Hash  (cost=1193.00..1193.00 rows=8 width=4)
   ->  Seq Scan on c  (cost=0.00..1193.00 
rows=8 width=4)

(9 rows)


However being a bit brutal:

set enable_mergejoin=false;
set enable_hashjoin=false;

explain select c.id from c join b on c_id=c.id  group by c.id order by 
c.id desc limit 5;
QUERY PLAN 


--
 Limit  (cost=0.00..15.24 rows=5 width=4)
   ->  Group  (cost=0.00..243798.00 rows=8 width=4)
 ->  Nested Loop  (cost=0.00..243598.00 rows=8 width=4)
   ->  Index Scan Backward using c_pkey on c 
(cost=0.00..1518.00 rows=8 width=4)
   ->  Index Scan using b_on_c on b  (cost=0.00..3.01 
rows=1 width=4)

 Index Cond: (b.c_id = "outer".id)
(6 rows)

What is interesting is why this plan is being rejected...

Cheers

Mark

Tobias Brox wrote:

Consider this setup - which is a gross simplification of parts of our
production system ;-)

  create table c (id integer primary key);
  create table b (id integer primary key, c_id integer);
  create index b_on_c on b(c_id)

  insert into c (select ... lots of IDs ...);
  insert into b (select id, id from c); /* keep it simple :-) */
  
Now, I'm just interessted in some few rows.  


All those gives good plans:

explain select c.id from c order by c.id limit 1;
explain select c.id from c group by c.id order by c.id limit 1;
explain select c.id from c join b on c_id=c.id order by c.id limit 1;

... BUT ... combining join, group and limit makes havoc:

explain select c.id from c join b on c_id=c.id  group by c.id order by c.id
desc limit 5;



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Limit + group + join

2005-08-26 Thread Mark Kirkwood

Tom Lane wrote:

Mark Kirkwood <[EMAIL PROTECTED]> writes:


What is interesting is why this plan is being rejected...



Which PG version are you using exactly?  That mistake looks like an
artifact of the 8.0 "fuzzy plan cost" patch, which we fixed recently:
http://archives.postgresql.org/pgsql-committers/2005-07/msg00474.php



Right on - 8.0.3 (I might look at how CVS tip handles this, could be 
interesting).



But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy
cost issue explains his results.

As far as the "desc" point goes, the problem is that mergejoins aren't
capable of dealing with backward sort order, so a merge plan isn't
considered for that case (or at least, it would have to have a sort
after it, which pretty much defeats the point for a fast-start plan).
I have some ideas about fixing this but it won't happen before 8.2.


That doesn't explain why the nested loop is being kicked tho', or have I 
missed something obvious? - it's been known to happen :-)...


Cheers

Mark


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Limit + group + join

2005-08-26 Thread Mark Kirkwood
Interestingly enough, 7.4.8 and 8.1devel-2005-08-23 all behave the same 
as 8.0.3 for me (tables freshly ANALYZEd):


joinlimit=# SELECT version();
 version 


-
 PostgreSQL 7.4.8 on i386-unknown-freebsd5.4, compiled by GCC gcc (GCC) 
3.4.2 [FreeBSD] 20040728

(1 row)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
c.id ORDER BY c.id DESC LIMIT 5;
  QUERY PLAN 


---
 Limit  (cost=10591.36..10591.39 rows=5 width=4)
   ->  Group  (cost=10591.36..10992.02 rows=80131 width=4)
 ->  Sort  (cost=10591.36..10791.69 rows=80131 width=4)
   Sort Key: c.id
   ->  Merge Join  (cost=0.00..4064.66 rows=80131 width=4)
 Merge Cond: ("outer".id = "inner".c_id)
 ->  Index Scan using c_pkey on c 
(cost=0.00..1411.31 rows=80131 width=4)
 ->  Index Scan using b_on_c on b 
(cost=0.00..1451.72 rows=80172 width=4)

(8 rows)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
c.id ORDER BY c.id  LIMIT 5;
   QUERY PLAN 


-
 Limit  (cost=0.00..0.27 rows=5 width=4)
   ->  Group  (cost=0.00..4264.99 rows=80131 width=4)
 ->  Merge Join  (cost=0.00..4064.66 rows=80131 width=4)
   Merge Cond: ("outer".id = "inner".c_id)
   ->  Index Scan using c_pkey on c  (cost=0.00..1411.31 
rows=80131 width=4)
   ->  Index Scan using b_on_c on b  (cost=0.00..1451.72 
rows=80172 width=4)

(6 rows)


joinlimit=# SELECT version();
  version 



 PostgreSQL 8.1devel on i386-unknown-freebsd5.4, compiled by GCC gcc 
(GCC) 3.4.2 [FreeBSD] 20040728

(1 row)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
c.id ORDER BY c.id DESC LIMIT 5;
  QUERY PLAN 


---
 Limit  (cost=10654.53..10654.55 rows=5 width=4)
   ->  Group  (cost=10654.53..11054.53 rows=8 width=4)
 ->  Sort  (cost=10654.53..10854.53 rows=8 width=4)
   Sort Key: c.id
   ->  Merge Join  (cost=0.00..4139.44 rows=8 width=4)
 Merge Cond: ("outer".id = "inner".c_id)
 ->  Index Scan using c_pkey on c 
(cost=0.00..1450.00 rows=8 width=4)
 ->  Index Scan using b_on_c on b 
(cost=0.00..1490.00 rows=8 width=4)

(8 rows)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
c.id ORDER BY c.id LIMIT 5;
   QUERY PLAN 


-
 Limit  (cost=0.00..0.27 rows=5 width=4)
   ->  Group  (cost=0.00..4339.44 rows=8 width=4)
 ->  Merge Join  (cost=0.00..4139.44 rows=8 width=4)
   Merge Cond: ("outer".id = "inner".c_id)
   ->  Index Scan using c_pkey on c  (cost=0.00..1450.00 
rows=8 width=4)
   ->  Index Scan using b_on_c on b  (cost=0.00..1490.00 
rows=8 width=4)

(6 rows)

The non default server params of relevance are:

shared_buffers = 12000
effective_cache_size = 10
work_mem/sort_mem = 20480

I did wonder if the highish sort_mem might be a factor, but no, with it 
 set to 1024 I get the same behaviour (just higher sort cost estimates).


Cheers

Mark

Tom Lane wrote:



Which PG version are you using exactly?  That mistake looks like an
artifact of the 8.0 "fuzzy plan cost" patch, which we fixed recently:
http://archives.postgresql.org/pgsql-committers/2005-07/msg00474.php

But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy
cost issue explains his results.

As far as the "desc" point goes, the problem is that mergejoins aren't
capable of dealing with backward sort order, so a merge plan isn't
considered for that case (or at least, it would have to have a sort
after it, which pretty much defeats the point for a fast-start plan).
I have some ideas about fixing this but it won't happen before 8.2.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Limit + group + join

2005-08-27 Thread Mark Kirkwood

Tom Lane wrote:


I looked into this and found that indeed the desirable join plan was
getting generated, but it wasn't picked because query_planner didn't
have an accurate idea of how much of the join needed to be scanned to
satisfy the GROUP BY step.  I've committed some changes that hopefully
will let 8.1 be smarter about GROUP BY ... LIMIT queries.



Very nice :-)

joinlimit=# EXPLAIN SELECT c.id FROM c JOIN b ON c_id=c.id  GROUP BY 
c.id ORDER BY c.id DESC LIMIT 5;
QUERY PLAN 


--
 Limit  (cost=0.00..15.23 rows=5 width=4)
   ->  Group  (cost=0.00..243730.00 rows=8 width=4)
 ->  Nested Loop  (cost=0.00..243530.00 rows=8 width=4)
   ->  Index Scan Backward using c_pkey on c 
(cost=0.00..1450.00 rows=8 width=4)
   ->  Index Scan using b_on_c on b  (cost=0.00..3.01 
rows=1 width=4)

 Index Cond: (b.c_id = "outer".id)
(6 rows)

This is 8.1devel from today.

regards

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Massive performance issues

2005-09-01 Thread Mark Kirkwood
It would be good to see EXPLAIN ANALYZE output for the three queries 
below (the real vs. estimated row counts being of interest).


The number of pages in your address table might be interesting to know too.

regards

Mark

Matthew Sackman wrote (with a fair bit of snippage):

explain select locality_2 from address where locality_2 = 'Manchester';
gives
   QUERY PLAN   


 Seq Scan on address  (cost=0.00..80677.16 rows=27923 width=12)
   Filter: ((locality_2)::text = 'Manchester'::text)


explain select locality_1 from address where locality_1 = 'Manchester';
gives
   QUERY PLAN


 Index Scan using address_locality_1_index on address
(cost=0.00..69882.18 rows=17708 width=13)
   Index Cond: ((locality_1)::text = 'Manchester'::text)


>
select street, locality_1, locality_2, city from address 
where (city = 'Nottingham' or locality_2 = 'Nottingham'

   or locality_1 = 'Nottingham')
  and upper(substring(street from 1 for 1)) = 'A' 
group by street, locality_1, locality_2, city

order by street
limit 20 offset 0



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Massive performance issues

2005-09-01 Thread Mark Kirkwood

Matthew Sackman wrote:


I need to get to the stage where I can run queries such as:

>
select street, locality_1, locality_2, city from address 
where (city = 'Nottingham' or locality_2 = 'Nottingham'

   or locality_1 = 'Nottingham')
  and upper(substring(street from 1 for 1)) = 'A' 
group by street, locality_1, locality_2, city

order by street
limit 20 offset 0

and have the results very quickly.



This sort of query will be handled nicely in 8.1 - it has bitmap and/or 
processing to make use of multiple indexes. Note that 8.1 is in beta now.


Cheers

Mark


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Massive performance issues

2005-09-02 Thread Mark Kirkwood

Mark Kirkwood wrote:

Matthew Sackman wrote:


I need to get to the stage where I can run queries such as:


 >

select street, locality_1, locality_2, city from address where (city = 
'Nottingham' or locality_2 = 'Nottingham'

   or locality_1 = 'Nottingham')
  and upper(substring(street from 1 for 1)) = 'A' group by street, 
locality_1, locality_2, city

order by street
limit 20 offset 0

and have the results very quickly.



This sort of query will be handled nicely in 8.1 - it has bitmap and/or 
processing to make use of multiple indexes. Note that 8.1 is in beta now.




As others have commented, you will probably need better hardware to 
achieve a factor of 1000 improvement, However I think using 8.1 could by 
itself give you a factor or 10->100 improvement.


e.g. Using your schema and generating synthetic data:


EXPLAIN
SELECT street, locality_1, locality_2, city
FROM address
WHERE (city = '500TH CITY'
   OR locality_2 = '50TH LOCALITY'
   OR locality_1 = '500TH LOCALITY')
  AND upper(substring(street from 1 for 1)) = 'A'
GROUP BY street, locality_1, locality_2, city
ORDER BY street
LIMIT 20 OFFSET 0

 QUERY PLAN 


--
 Limit  (cost=59559.04..59559.09 rows=20 width=125)
   ->  Sort  (cost=59559.04..59559.09 rows=21 width=125)
 Sort Key: street
 ->  HashAggregate  (cost=59558.37..59558.58 rows=21 width=125)
   ->  Bitmap Heap Scan on address  (cost=323.19..59556.35 
rows=202 width=125)
 Recheck Cond: (((city)::text = '500TH CITY'::text) 
OR ((locality_2)::text = '50TH LOCALITY'::text) OR ((locality_1)::text = 
'500TH LOCALITY'::text))
 Filter: (upper("substring"((street)::text, 1, 1)) 
= 'A'::text)

 ->  BitmapOr  (cost=323.19..323.19 rows=40625 width=0)
   ->  Bitmap Index Scan on address_city_index 
 (cost=0.00..15.85 rows=1958 width=0)
 Index Cond: ((city)::text = '500TH 
CITY'::text)
   ->  Bitmap Index Scan on 
address_locality_2_index  (cost=0.00..143.00 rows=18000 width=0)
 Index Cond: ((locality_2)::text = 
'50TH LOCALITY'::text)
   ->  Bitmap Index Scan on 
address_locality_1_index  (cost=0.00..164.33 rows=20667 width=0)
 Index Cond: ((locality_1)::text = 
'500TH LOCALITY'::text)

(14 rows)


This takes 0.5s -> 2s to execute (depending on the frequencies generated 
for the two localities).


So we are a factor of 10 better already, on modest HW (2xPIII 1Ghz 2G 
running FreeBSD 5.4).


To go better than this you could try a specific summary table:


CREATE TABLE address_summary AS
SELECT street,
   locality_1,
   locality_2,
   city,
   upper(substring(street from 1 for 1)) AS cut_street
FROM address
GROUP BY street, locality_1, locality_2, city
;

CREATE INDEX address_summary_city_index ON address_summary(city);
CREATE INDEX address_summary_locality_1_index ON 
address_summary(locality_1);
CREATE INDEX address_summary_locality_2_index ON 
address_summary(locality_2);

CREATE INDEX address_summary_street_index ON address_summary(street);
CREATE INDEX street_summary_prefix ON address_summary(cut_street);


And the query can be rewritten as:

EXPLAIN
SELECT street, locality_1, locality_2, city
FROM address_summary
WHERE (city = '500TH CITY'
   OR locality_2 = '50TH LOCALITY'
   OR locality_1 = '500TH LOCALITY')
  AND cut_street = 'A'
ORDER BY street
LIMIT 20 OFFSET 0

   QUERY PLAN 


---
 Limit  (cost=0.00..2006.05 rows=20 width=125)
   ->  Index Scan using address_summary_street_index on address_summary 
 (cost=0.00..109028.81 rows=1087 width=125)
 Filter: city)::text = '500TH CITY'::text) OR 
((locality_2)::text = '50TH LOCALITY'::text) OR ((locality_1)::text = 
'500TH LOCALITY'::text)) AND (cut_street = 'A'::text))

(3 rows)


This takes 0.02s - so getting close to the factor of 1000 (a modern 
machine with 3-5 times the memory access speed will get you there easily).


The effectiveness of the summary table will depend on the how much the 
GROUP BY reduces the cardinality (not much in this synthetic case), so 
you will probably get better improvement with the real data!


Cheers

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-23 Thread Mark Kirkwood

K C Lau wrote:
Thank you all for your suggestions. I' tried, with some variations too, 
but still no success. The times given are the best of a few repeated 
tries on an 8.1 beta 2 db freshly migrated from 8.0.3 on Windows.




A small denormalization, where you mark the row with the latest atdate 
for each playerid may get you the performance you want.


e.g: (8.1beta1)

ALTER TABLE player ADD islastatdate boolean;

UPDATE player SET islastatdate = true where (playerid,atdate) IN
(SELECT playerid, atdate FROM vcurplayer);

CREATE OR REPLACE VIEW vcurplayer AS
SELECT * FROM player a
WHERE islastatdate;

CREATE INDEX player_id_lastatdate ON player(playerid, islastatdate)
WHERE islastatdate;

ANALYZE player;

Generating some test data produced:

EXPLAIN ANALYZE
SELECT playerid,atdate
FROM vcurplayer
WHERE playerid='0';

   QUERY PLAN 



 Index Scan using player_id_lastatdate on player a  (cost=0.00..4.33 
rows=1 width=13) (actual time=0.142..0.149 rows=1 loops=1)

   Index Cond: ((playerid = '0'::text) AND (lastatdate = true))
   Filter: lastatdate
 Total runtime: 0.272 ms
(4 rows)

Whereas with the original view definition:

CREATE OR REPLACE VIEW vcurplayer AS
SELECT * FROM player a
WHERE a.atdate =
( SELECT max(b.atdate) FROM player b
  WHERE a.playerid = b.playerid);

EXPLAIN ANALYZE
SELECT playerid,atdate
FROM vcurplayer
WHERE playerid='0';

QUERY PLAN 


-
 Index Scan using player_id_date on player a  (cost=0.00..7399.23 
rows=11 width=13) (actual time=121.738..121.745 rows=1 loops=1)

   Index Cond: (playerid = '0'::text)
   Filter: (atdate = (subplan))
   SubPlan
 ->  Result  (cost=1.72..1.73 rows=1 width=0) (actual 
time=0.044..0.047 rows=1 loops=2000)

   InitPlan
 ->  Limit  (cost=0.00..1.72 rows=1 width=4) (actual 
time=0.028..0.031 rows=1 loops=2000)
   ->  Index Scan Backward using player_id_date on 
player b  (cost=0.00..3787.94 rows=2198 width=4) (actual 
time=0.019..0.019 rows=1 loops=2000)

 Index Cond: ($0 = playerid)
 Filter: (atdate IS NOT NULL)
 Total runtime: 121.916 ms
(11 rows)

Note that my generated data has too many rows for each playerid, but the 
 difference in performance should illustrate the idea.


Cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-23 Thread Mark Kirkwood

K C Lau wrote:

I'm wondering if this performance issue is common enough for other users 
to merit a fix in pg, especially as it seems that with MVCC, each of the 
data records need to be accessed in addition to scanning the index.




Yes - there are certainly cases where index only access (or something 
similar, like b+tree tables) would be highly desirable.


From what I have understood from previous discussions, there are 
difficulties involved with producing a design that does not cause new 
problems...


regards

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Bytea poor performance

2005-10-15 Thread Mark Kirkwood

NSO wrote:

Hello,

 Yes, I can understand that, but then why the same app on the server
machine is done in 4 seconds? (big difference from 20-30 seconds). I
tryed to monitor network traffic and it is used only for 1-2% of total
100mbit.
 


Is this a web app? If so, then check you are using the same browser 
settings on the server and client (or the same browser for that matter).


Note that some browsers really suck for large (wide or long) table display!

cheers

Mark


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Bytea poor performance

2005-10-17 Thread Mark Kirkwood

NSO wrote:


Well, no. Delphi isn't better, same time just for downloading data... But
as I told before, if for ex. pgAdminIII is running on server machine it is
a lot faster, I do not know why, I was monitoring network connection
between client and server and it is using only up to 2% of full speed.. is
server can't send faster? or client is not accepting data faster?

 


That difference is suspiciously high - you need to get one of your 
network boys to check that the NIC in your client box is operating at 
full speed (and/or does not clash with whatever network device it is 
plugged into). The other thing to check that that your client box is 
reasonably spec'ed : e.g. not running out of ram or disk in particular - 
or suffering from massively fragmented disk (the latter if its win32).


With respect to the Delphi, you can probably narrow where it has issues 
by running test versions of your app that have bits of functionality 
removed:


- retrieves the bytea but does not display it
- retrieves the bytea but displays it unformatted, or truncated
- does not retrieve the bytea at all

The difference between these should tell you where your issue is!

By way of comparison, I have a Php page (no Delphi sorry) that 
essentially shows 50 rows from your files table over a 100Mbit network. 
Some experiments with that show:


- takes 2 seconds to display in Firefox
- takes 0.2 seconds to complete a request (i.e. "display") using httperf

This indicates that (in my case) most of the 2 seconds is being used by 
Firefox (not being very good at) formatting the wide output for display.


The figure of about 2-5 seconds seems about right, so your 20-30 seconds 
certainly seems high!



cheers

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Used Memory

2005-10-23 Thread Mark Kirkwood

Christian Paul B. Cosinas wrote:
 


Here is the result of “free” command” I am talking about.

What does this result mean?



I seem to recall the Linux man page for 'free' being most 
unenlightening, so have a look at:


http://gentoo-wiki.com/FAQ_Linux_Memory_Management

(For Gentoo, but should be applicable to RHEL).

The basic idea is that modern operating systems try to make as much use 
of the memory as possible. Postgresql depends on this behavior - e.g. a 
page that has previously been fetched from disk, will be cached, so it 
can be read from memory next time, as this is faster(!)


 

I just noticed that as long as the free memory in the first row (which 
is 55036 as of now) became low, the slower is the response of the 
database server.
 


Well, you could be swapping - what does the swap line of 'free' show then?

Also, how about posting your postgresql.conf (or just the non-default 
parameters) to this list?


Some other stuff that could be relevant:

- Is the machine just a database server, or does it run (say) Apache + Php?
- When the slowdown is noticed, does this coincide with certain 
activities - e.g, backup , daily maintenance, data load(!) etc.



regards

Mark



I choose Polesoft Lockspam to fight spam, and you?
http://www.polesoft.com/refer.html


Nope, not me either.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Used Memory

2005-10-24 Thread Mark Kirkwood

Christian Paul B. Cosinas wrote:

Hi To all those who replied. Thank You.

I monitor my database server a while ago and found out that memory is used
extensively when I am fetching records from the database. I use the command
"fetch all" in my VB Code and put it in a recordset.Also in this command the
CPU utilization is used extensively.

Is there something wrong with my code or is it just the way postgresql is
behaving which I cannot do something about it?

I just monitor one workstation connecting to the database server and it is
already eating up about 20 % of the CPU of database server.

Which I think will not be applicable to our system since we have a target of
25 PC connecting to the database server most of the time.



Could you post the query and the output of EXPLAIN ANALYZE?

In addition, have you run ANALYZE on all the tables in that database ? 
(sorry, have to ask :-) ).


cheers

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Used Memory

2005-10-24 Thread Mark Kirkwood

Christian Paul B. Cosinas wrote:

Hi mark

I have so many functions, more than 100 functions in the database :) And I
am dealing about 3 million of records in one database.
And about 100 databases :)



LOL - sorry, mis-understood your previous message to mean you had 
identified *one* query where 'fetch all' was causing the problem!


Having said that, to make much more progress, you probably want to 
identify those queries that are consuming your resource, pick one of two 
of the particularly bad ones and post 'em.


There are a number of ways to perform said identification, enabling 
stats collection might be worth a try.


regards

Mark


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Mark Kirkwood

Luke Lonergan wrote:

(mass snippage) 
time psql -c "select count(*) from ivp.bigtable1" dgtestdb

[EMAIL PROTECTED] IVP]$ cat sysout3
  count   
--

 8000
(1 row)


real1m9.875s
user0m0.000s
sys 0m0.004s
[EMAIL PROTECTED] IVP]$ !du
du -sk dgtestdb/base
17021260dgtestdb/base


Summary:

 OK – you can get more I/O bandwidth out of the current 
I/O path for sequential scan if you tune the filesystem for large 
readahead.  This is a cheap alternative to overhauling the executor to 
use asynch I/O.


Still, there is a CPU limit here – this is not I/O bound, it is CPU 
limited as evidenced by the sensitivity to readahead settings.   If the 
filesystem could do 1GB/s, you wouldn’t go any faster than 244MB/s.





Luke,

Interesting - but possibly only representative for a workload consisting 
entirely of one executor doing "SELECT ... FROM my_single_table".


If you alter this to involve more complex joins (e.g 4. way star) and 
(maybe add a small number of concurrent executors too) - is it still the 
case?


Cheers

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-18 Thread Mark Kirkwood

Luke Lonergan wrote:

Mark,

On 11/18/05 3:46 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote:

If you alter this to involve more complex joins (e.g 4. way star) and
(maybe add a small number of concurrent executors too) - is it still the
case?


I may not have listened to you - are you asking about whether the 
readahead works for these cases?


I’ll be running some massive TPC-H benchmarks on these machines soon – 
we’ll see then.



That too, meaning the business of 1 executor random reading a given 
relation file whilst another is sequentially scanning (some other) part 
of it


Cheers

Mark

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-19 Thread Mark Kirkwood

Luke Lonergan wrote:

Mark,

On 11/18/05 3:46 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote:
 


If you alter this to involve more complex joins (e.g 4. way star) and
(maybe add a small number of concurrent executors too) - is it still the
case?



4-way star, same result, that's part of my point.  With Bizgres MPP, the
4-way star uses 4 concurrent scanners, though not all are active all the
time.  And that's per segment instance - we normally use one segment
instance per CPU, so our concurrency is NCPUs plus some.



Luke - I don't think I was clear enough about what I was asking, sorry.

I added the more "complex joins" comment because:

- I am happy that seqscan is cpu bound after ~110M/s (It's cpu bound on 
my old P3 system even earlier than that)
- I am curious if the *other* access methods (indexscan, nested loop, 
hash, merge, bitmap) also suffer then same fate.


I'm guessing from your comment that you have tested this too, but I 
think its worth clarifying!


With respect to Bizgres MPP, scan parallelism is a great addition... 
very nice! (BTW - is that in 0.8, or are we talking a new product variant?)


regards

Mark



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-20 Thread Mark Kirkwood

Mark Kirkwood wrote:



- I am happy that seqscan is cpu bound after ~110M/s (It's cpu bound on 
my old P3 system even earlier than that)


Ahem - after reading Alan's postings I am not so sure, ISTM that there 
is some more investigation required here too :-).




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-20 Thread Mark Kirkwood

Alan Stange wrote:

Another data point.
We had some down time on our system today to complete some maintenance 
work.  It took the opportunity to rebuild the 700GB file system using 
XFS instead of Reiser.


One iostat output for 30 seconds is

avg-cpu:  %user   %nice%sys %iowait   %idle
  1.580.00   19.69   31.94   46.78

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sdd 343.73175035.73   277.555251072   8326

while doing a select count(1) on the same large table as before.   
Subsequent iostat output all showed that this data rate was being 
maintained.  The system is otherwise mostly idle during this measurement.


The sequential read rate is 175MB/s.  The system is the same as earlier, 
one cpu is idle and the second is ~40% busy doing the scan and ~60% 
idle.   This is  postgresql 8.1rc1, 32KB block size.  No tuning except 
for using a 1024KB read ahead.


The peak speed of the attached storage is 200MB/s (a 2Gb/s fiber channel 
controller).  I see no reason why this configuration wouldn't generate 
higher IO rates if a faster IO connection were available.


Can you explain again why you think there's an IO ceiling of 120MB/s 
because I really don't understand?




I think what is going on here is that Luke's observation of the 120 Mb/s 
rate is taken from data using 8K block size - it looks like we can get 
higher rates with 32K.


A quick test on my P3 system seems to support this (the numbers are a 
bit feeble, but the difference is interesting):


The test is SELECT 1 FROM table, stopping Pg and unmounting the file 
system after each test.


8K blocksize:
25 s elapsed
48 % idle from vmstat (dual cpu system)
70 % busy from gstat (Freebsd GEOM io monitor)
181819 pages in relation
56 Mb/s effective IO throughput


32K blocksize:
23 s elapsed
44 % idle from vmstat
80 % busy from gstat
45249 pages in relation
60 Mb/s effective IO throughput


I re-ran these several times - very repeatable (+/- 0.25 seconds).

This is Freebsd 6.0 with the readahead set to 16 blocks, UFS2 filesystem 
created with 32K blocksize (both cases). It might be interesting to see 
the effect of using 16K (the default) with the 8K Pg block size, I would 
expect this to widen the gap.


Cheers

Mark


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-20 Thread Mark Kirkwood

Mark Kirkwood wrote:


The test is SELECT 1 FROM table


That should read "The test is SELECT count(1) FROM table"

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-21 Thread Mark Kirkwood

Luke Lonergan wrote:


So that leaves the question - why not more than 64% of the I/O scan rate?
And why is it a flat 64% as the I/O subsystem increases in speed from
333-400MB/s?



It might be interesting to see what effect reducing the cpu consumption 
 entailed by the count aggregation has - by (say) writing a little bit 
of code to heap scan the desired relation (sample attached).


Cheers

Mark




/*
 * fastcount.c
 *
 * Do a count that uses considerably less CPU time than an aggregate.
 */

#include "postgres.h"

#include "funcapi.h"
#include "access/heapam.h"
#include "catalog/namespace.h"
#include "utils/builtins.h"


extern Datum fastcount(PG_FUNCTION_ARGS);


PG_FUNCTION_INFO_V1(fastcount);
Datum
fastcount(PG_FUNCTION_ARGS)
{
text   *relname = PG_GETARG_TEXT_P(0);
RangeVar   *relrv;
Relationrel;
HeapScanDesc scan;
HeapTuple   tuple;
int64   result = 0;

/* Use the name to get a suitable range variable and open the relation. 
*/
relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
rel = heap_openrv(relrv, AccessShareLock);

/* Start a heap scan on the relation. */
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
result++;
}

/* End the scan and close up the relation. */
heap_endscan(scan);
heap_close(rel, AccessShareLock);


PG_RETURN_INT64(result);
}

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-23 Thread Mark Kirkwood

Luke Lonergan wrote:

Mark,

This is an excellent idea – unfortunately I’m in Maui right now 
(Mahalo!) and I’m not getting to testing with this.  My first try was 
with 8.0.3 and it’s an 8.1 function I presume.


Not to be lazy – but any hint as to how to do the same thing for 8.0?



Yeah, it's 8.1 - I didn't think to check against 8.0. The attached 
variant works with 8.0.4 (textToQualifiedNameList needs 2 args)


cheers

Mark

P.s. Maui eh, sounds real nice.
/*
 * fastcount.c
 *
 * Do a count that uses considerably less CPU time than an aggregate.
 *
 * (Variant for 8.0.x - textToQualifiedNameList needs 2 args)
 */

#include "postgres.h"

#include "funcapi.h"
#include "access/heapam.h"
#include "catalog/namespace.h"
#include "utils/builtins.h"


extern Datum fastcount(PG_FUNCTION_ARGS);


PG_FUNCTION_INFO_V1(fastcount);
Datum
fastcount(PG_FUNCTION_ARGS)
{
text   *relname = PG_GETARG_TEXT_P(0);
RangeVar   *relrv;
Relationrel;
HeapScanDesc scan;
HeapTuple   tuple;
int64   result = 0;

/* Use the name to get a suitable range variable and open the relation. 
*/
relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname, ""));
rel = heap_openrv(relrv, AccessShareLock);

/* Start a heap scan on the relation. */
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
result++;
}

/* End the scan and close up the relation. */
heap_endscan(scan);
heap_close(rel, AccessShareLock);


PG_RETURN_INT64(result);
}

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Mark Kirkwood

Luke Lonergan wrote:



12.9GB of DBT-3 data from the lineitem table

llonergan=# select relpages from pg_class where relname='lineitem';
 relpages 
--

  1579270
(1 row)

1579270*8192/100
12937 Million Bytes or 12.9GB

llonergan=# \timing
Timing is on.
llonergan=# select count(1) from lineitem;
  count   
--

 59986052
(1 row)

Time: 197870.105 ms


So 198 seconds is the uncached read time with count (Just for clarity, 
did you clear the Pg and filesystem caches or unmount / remount the 
filesystem?)



llonergan=# select count(1) from lineitem;
  count   
--

 59986052
(1 row)

Time: 49912.164 ms
llonergan=# select count(1) from lineitem;
  count   
--

 59986052
(1 row)

Time: 49218.739 ms



and ~50 seconds is the (partially) cached read time with count


llonergan=# select fastcount('lineitem');
 fastcount 
---

  59986052
(1 row)

Time: 33752.778 ms
llonergan=# select fastcount('lineitem');
 fastcount 
---

  59986052
(1 row)

Time: 34543.646 ms
llonergan=# select fastcount('lineitem');
 fastcount 
---

  59986052
(1 row)

Time: 34528.053 ms



so ~34 seconds is the (partially) cached read time for fastcount -
I calculate this to give ~362Mb/s effective IO rate (I'm doing / by 
1024*1024 not 1000*1000) FWIW.


While this is interesting, you probably want to stop Pg, unmount the 
filesystem, and restart Pg to get the uncached time for fastcount too 
(and how does this compare to uncached read with dd using the same block 
size?).


But at this stage it certainly looks the the heapscan code is pretty 
efficient - great!


Oh - and do you want to try out 32K block size, I'm interested to see 
what level of improvement you get (as my system is hopelessly cpu bound...)!




Analysis:

Bandwidth   Percent of max
dd Read 407MB/s 100%
Count(1)263MB/s 64.6%
HeapScan383MB/s 94.1%



Cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Mark Kirkwood

Luke Lonergan wrote:

Mark,


It would be nice to put some tracers into the executor and see where the
time is going.  I'm also curious about the impact of the new 8.1 virtual
tuples in reducing the executor overhead.  In this case my bet's on the agg
node itself, what do you think?



Yeah - it's pretty clear that the count aggregate is fairly expensive 
wrt cpu - However, I am not sure if all agg nodes suffer this way (guess 
we could try a trivial aggregate that does nothing for all tuples bar 
the last and just reports the final value it sees).


Cheers

Mark


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Mark Kirkwood

Luke Lonergan wrote:


That says it's something else in the path.  As you probably know there is a
page lock taken, a copy of the tuple from the page, lock removed, count
incremented for every iteration of the agg node on a count(*).  Is the same
true of a count(1)?



Sorry Luke - message 3 - I seem to be suffering from a very small 
working memory buffer myself right now, I think it's after a day of 
working with DB2 ... :-)


Anyway, as I read src/backend/parser/gram.y:6542 - count(*) is 
transformed into count(1), so these two are identical.


Cheers (last time tonight, promise!)

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Mark Kirkwood

Luke Lonergan wrote:

Mark,



Time: 197870.105 ms


So 198 seconds is the uncached read time with count (Just for clarity,
did you clear the Pg and filesystem caches or unmount / remount the
filesystem?)



Nope - the longer time is due to the "second write" known issue with
Postgres - it writes the data to the table, but all of the pages are marked
dirty?  So, always on the first scan after loading they are written again.
This is clear as you watch vmstat - the pattern on the first seq scan is
half read / half write.



Ah - indeed - first access after a COPY no? I should have thought of 
that, sorry!



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Mark Kirkwood

Tom Lane wrote:

Greg Stark <[EMAIL PROTECTED]> writes:


Last I heard the reason count(*) was so expensive was because its state
variable was a bigint. That means it doesn't fit in a Datum and has to be
alloced and stored as a pointer. And because of the Aggregate API that means
it has to be allocated and freed for every tuple processed.



There's a hack in 8.1 to avoid the palloc overhead (courtesy of Neil
Conway IIRC).



It certainly makes quite a difference as I measure it:

doing select(1) from a 181000 page table (completely uncached) on my PIII:

8.0 : 32 s
8.1 : 25 s

Note that the 'fastcount()' function takes 21 s in both cases - so all 
the improvement seems to be from the count overhead reduction.


Cheers

Mark








---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread Mark Kirkwood

Merlin Moncure wrote:

It certainly makes quite a difference as I measure it:

doing select(1) from a 181000 page table (completely uncached) on my


PIII:


8.0 : 32 s
8.1 : 25 s

Note that the 'fastcount()' function takes 21 s in both cases - so all
the improvement seems to be from the count overhead reduction.



Are you running windows?  There is a big performance improvement in
count(*) on pg 8.0->8.1 on win32 that is not relevant to this debate...



No - FreeBSD 6.0 on a dual PIII 1 Ghz. The slow cpu means that the 8.1 
improvements are very noticeable!


A point of interest - applying Niels palloc - avoiding changes to 
NodeAgg.c and int8.c in 8.0 changes those results to:


8.0 + palloc avoiding patch : 27 s

(I am guessing the remaining 2 s could be shaved off if I backported 
8.1's virtual tuples - however that looked like a lot of work)


Cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] SAN/NAS options

2005-12-13 Thread Mark Kirkwood

Charles Sprickman wrote:

Hello all,

It seems that I'm starting to outgrow our current Postgres setup.  We've 
been running a handful of machines as standalone db servers.  This is 
all in a colocation environment, so everything is stuffed into 1U 
Supermicro boxes.  Our standard build looks like this:


Supermicro 1U w/SCA backplane and 4 bays
2x2.8 GHz Xeons
Adaptec 2015S "zero channel" RAID card
2 or 4 x 73GB Seagate 10K Ultra 320 drives (mirrored+striped)
2GB RAM
FreeBSD 4.11
PGSQL data from 5-10GB per box

Recently I started studying what we were running up against in our 
nightly runs that do a ton of updates/inserts to prep things for the 
tasks the db does during the business day (light mix of 
selects/inserts/updates). While we have plenty of disk bandwidth 
(according to bonnie), we are really dying on IOPS. I'm guessing this is 
a mix of a rather anemic RAID controller (ever notice how adaptec 
doesn't publish any real performance specs on raid cards?) and having 
only two or four spindles (effectively 1 or 2 on writes).


So that's where we are...

I'm new to the whole SAN thing, but did recently pick up a few used 
NetApp shelves and a Fibre Channel RAID HBA (Mylex ExtremeRAID 3000, 
also used) to toy with.  I started wondering if I could put something 
together to both get our storage on one set of boxes and allow me to get 
data striped across more drives.  Our budget is not huge and we are not 
adverse to getting used gear where appropriate.


What do you folks recommend?  I'm just starting to look at what's out 
there for SANs and NAS, and from what I've seen, our options are:




Leaving the whole SAN issue for a moment:

It would be interesting to see if moving to FreeBSD 6.0 would help you - 
the vfs layer is no longer throttled by the (SMP) GIANT lock in this 
version, and that may make quite a difference (given you have SMP boxes).


Another interesting thing to try is rebuilding the database ufs 
filesystem(s) with 32K blocks and 4K frags (as opposed to 8K/1K or 
16K/2K - can't recall the default on 4.x). I found this to give a factor 
of 2 speedup on random disk access (specifically queries doing indexed 
joins).


Is it mainly your 2 disk machines that are IOPS bound? if so, a cheap 
option may be to buy 2 more cheetahs for them! If it's the 4's, well how 
about a 2U U320 diskpack from whomever supplies you the Supermicro boxes?


I have just noticed Luke's posting - I would second the advice to avoid 
SAN - in my experience it's an expensive way to buy storage.


best wishes

Mark


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Simple Join

2005-12-14 Thread Mark Kirkwood

Kevin Brown wrote:

I'll just start by warning that I'm new-ish to postgresql.

I'm running 8.1 installed from source on a Debian Sarge server.  I have a 
simple query that I believe I've placed the indexes correctly for, and I 
still end up with a seq scan.  It makes sense, kinda, but it should be able 
to use the index to gather the right values.  I do have a production set of 
data inserted into the tables, so this is running realistically:


dli=# explain analyze SELECT ordered_products.product_id
dli-# FROM to_ship, ordered_products
dli-# WHERE to_ship.ordered_product_id = ordered_products.id AND
dli-# ordered_products.paid = TRUE AND
dli-# ordered_products.suspended_sub = FALSE;


You scan 60 rows from to_ship to get about 25000 - so some way to 
cut this down would help.


Try out an explicit INNER JOIN which includes the filter info for paid 
and suspended_sub in the join condition (you may need indexes on each of 
id, paid and suspended_sub, so that the 8.1 optimizer can use a bitmap 
scan):



SELECT ordered_products.product_id
FROM to_ship INNER JOIN ordered_products
ON (to_ship.ordered_product_id = ordered_products.id
AND ordered_products.paid = TRUE  AND 
ordered_products.suspended_sub = FALSE); 




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Simple Join

2005-12-14 Thread Mark Kirkwood

Kevin Brown wrote:




I only had two explicit indexes.  One was on to_ship.ordered_product_id and 
the other was on ordered_products.paid.  ordered_products.id is a primary 
key.  This is on your query with an index added on suspended_sub:


dli=# explain analyze SELECT ordered_products.product_id
dli-# FROM to_ship INNER JOIN ordered_products
dli-# ON (to_ship.ordered_product_id = ordered_products.id
dli(#  AND ordered_products.paid = TRUE  AND 
dli(# ordered_products.suspended_sub = FALSE);
   QUERY PLAN
-
 Hash Join  (cost=5126.19..31528.40 rows=20591 width=8) (actual 
time=4554.190..23519.618 rows=14367 loops=1)

   Hash Cond: ("outer".ordered_product_id = "inner".id)
   ->  Seq Scan on to_ship  (cost=0.00..11529.12 rows=611612 width=8) (actual 
time=11.254..15192.042 rows=611612 loops=1)
   ->  Hash  (cost=4954.79..4954.79 rows=21759 width=16) (actual 
time=4494.900..4494.900 rows=18042 loops=1)
 ->  Index Scan using paid_index on ordered_products  
(cost=0.00..4954.79 rows=21759 width=16) (actual time=72.431..4414.697 
rows=18042 loops=1)

   Index Cond: (paid = true)
   Filter: (paid AND (NOT suspended_sub))
 Total runtime: 23532.785 ms
(8 rows)



Well - that had no effect at all :-) You don't have and index on 
to_ship.ordered_product_id do you? - try adding one (ANALYZE again), and 
let use know what happens (you may want to play with SET 
enable_seqscan=off as well).


And also, if you are only ever interested in paid = true and 
suspended_sub = false, then you can recreate these indexes as partials - 
e.g:


CREATE INDEX paid_index ON ordered_products (paid) WHERE paid = true;
CREATE INDEX suspended_sub_index ON ordered_products (suspended_sub) 
WHERE suspended_sub = false;


So what's the best way to performance wiggle this info out of the db?  The 
list of values is only about 30 tuples long out of this query, so I was 
figuring I could trigger on insert to to_ship to place the value into another 
table if it didn't already exist.  I'd rather the writing be slow than the 
reading.


Yeah - all sort of horrible denormalizations are possible :-), hopefully 
we can get the original query to work ok, and avoid the need to add code 
or triggers to you app.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Simple Join

2005-12-15 Thread Mark Kirkwood

Kevin Brown wrote:

On Wednesday 14 December 2005 18:36, you wrote:


Well - that had no effect at all :-) You don't have and index on
to_ship.ordered_product_id do you? - try adding one (ANALYZE again), and
let use know what happens (you may want to play with SET
enable_seqscan=off as well).



I _DO_ have an index on to_ship.ordered_product_id.  It's a btree.



Sorry - read right past it!

Did you try out enable_seqscan=off? I'm interested to see if we can get 
8.1 bitmap anding the three possibly useful columns together on 
ordered_products and *then* doing the join to to_ship.


Cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Simple Join

2005-12-15 Thread Mark Kirkwood

Mitch Skinner wrote:

I saw that; what I'm suggesting is that that you try creating a 3-column
index on ordered_products using the paid, suspended_sub, and id columns.
In that order, I think, although you could also try the reverse.  It may
or may not help, but it's worth a shot--the fact that all of those
columns are used together in the query suggests that you might do better
with a three-column index on those. 


With all three columns indexed individually, you're apparently not
getting the bitmap plan that Mark is hoping for.  I imagine this has to
do with the lack of multi-column statistics in postgres, though you
could also try raising the statistics target on the columns of interest.

Setting enable_seqscan to off, as others have suggested, is also a
worthwhile experiment, just to see what you get.




Right on. Some of these "coerced" plans may perform much better. If so, 
we can look at tweaking your runtime config: e.g.


effective_cache_size
random_page_cost
default_statistics_target

to see if said plans can be chosen "naturally".

cheers

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Mark Kirkwood

Craig A. James wrote:
I asked a while back if there were any plans to allow developers to 
override the optimizer's plan and force certain plans, and received a 
fairly resounding "No".  The general feeling I get is that a lot of work 
has gone into the optimizer, and by God we're going to use it!


I think this is just wrong, and I'm curious whether I'm alone in this 
opinion.


Over and over, I see questions posted to this mailing list about 
execution plans that don't work out well.  Many times there are good 
answers - add an index, refactor the design, etc. - that yield good 
results.  But, all too often the answer comes down to something like 
this recent one:


  > Right on. Some of these "coerced" plans may perform   > much better. 
If so, we can look at tweaking your runtime

  > config: e.g.
  >
  > effective_cache_size
  > random_page_cost
  > default_statistics_target
  >
  > to see if said plans can be chosen "naturally".

I see this over and over.  Tweak the parameters to "force" a certain 
plan, because there's no formal way for a developer to say, "I know the 
best plan."




I hear what you are saying, but to use this fine example - I don't know 
what the best plan is - these experiments part of an investigation to 
find *if* there is a better plan, and if so, why Postgres is not finding it.


There isn't a database in the world that is as smart as a developer, or 
that can have insight into things that only a developer can possibly 
know.


That is often true - but the aim is to get Postgres's optimizer closer 
to developer smartness.


After years of using several other database products (some supporting 
hint type constructs and some not), I have come to believe that hinting 
(or similar) actually *hinders* the development of a great optimizer.



Best wishes

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Mark Kirkwood

Craig A. James wrote:



What would be cool would be some way the developer could alter the plan, 
but they way of doing so would strongly encourage the developer to send 
the information to this mailing list.  Postgres would essentially say, 
"Ok, you can do that, but we want to know why!"




Yeah it would - an implementation I have seen that I like is where the 
developer can supply the *entire* execution plan with a query. This is 
complex enough to make casual use unlikely :-), but provides the ability 
to try out other plans, and also fix that vital query that must run 
today.


cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Simple Join

2005-12-16 Thread Mark Kirkwood

David Lang wrote:

On Fri, 16 Dec 2005, Mark Kirkwood wrote:



Right on. Some of these "coerced" plans may perform much better. If 
so, we can look at tweaking your runtime config: e.g.


effective_cache_size
random_page_cost
default_statistics_target

to see if said plans can be chosen "naturally".



Mark, I've seen these config options listed as tweaking targets fairly 
frequently, has anyone put any thought or effort into creating a test 
program that could analyse the actual system and set the defaults based 
on the measured performance?




I am sure this has been discussed before, I found this thread -

http://archives.postgresql.org/pgsql-performance/2004-07/msg00189.php

but I seem to recall others (but offhand can't find any of them).



I think that the real difficultly here is that the construction of the 
test program is non trivial - for instance, the best test program for 
tuning *my* workload is my application with its collection of data, but 
it is probably not a good test program for *anyone else's* workload.


cheers

Mark



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] SAN/NAS options

2005-12-16 Thread Mark Kirkwood

Jim C. Nasby wrote:

On Wed, Dec 14, 2005 at 08:28:56PM +1300, Mark Kirkwood wrote:

Another interesting thing to try is rebuilding the database ufs 
filesystem(s) with 32K blocks and 4K frags (as opposed to 8K/1K or 
16K/2K - can't recall the default on 4.x). I found this to give a factor 
of 2 speedup on random disk access (specifically queries doing indexed 
joins).



Even if you're doing a lot of random IO? I would think that random IO
would perform better if you use smaller (8K) blocks, since there's less
data being read in and then just thrown away that way.




Yeah, that's what I would have expected too! but the particular queries 
I tested do a ton of random IO (correlation of 0.013 on the join column 
for the big table). I did wonder if the gain has something to do with 
the underlying RAID stripe size (64K or 256K in my case), as I have only 
tested the 32K vs 8K/16K on RAIDed systems.


I guess for a system where the number of concurrent users give rise to 
memory pressure, it will cause more thrashing of the file buffer cache, 
much could be a net loss.


Still worth trying out I think, you will know soon enough if it is a win 
or lose!


Note that I did *not* alter Postgres page/block size (BLCKSZ) from 8K, 
so no dump/reload is required to test this out.


cheers

Mark


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex

2005-12-16 Thread Mark Kirkwood

Bruce Momjian wrote:

How are star joins different from what we do now?

---



Recall that a "star" query with n tables means a query where there are 
(n - 1) supposedly small tables (dimensions) and 1 large table (fact) - 
which has foreign keys to each dimension.


As I understand it, the classic "tar join" is planned like this:

1) The result of the restriction clauses on each of the (small) 
dimension tables is computed.

2) The cartesian product of all the results of 1) is formed.
3) The fact (big) table is joined to the pseudo relation formed by 2).

From what I have seen most vendor implementations do not (always) 
perform the full cartesion product of the dimensions, but do some of 
them, join to the fact, then join to the remaining dimensions afterwards.


There is another join strategy called the "star transformation" where 
some of the dimension joins get rewritten as subqueries, then the above 
method is used again! This tends to be useful when the cartesion 
products would be stupidly large (e.g. "sparse" facts, or few 
restriction clauses)


regards

Mark

P.s : Luke or Simon might have a better definition... but thought I'd 
chuck in my 2c... :-)



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex

2005-12-17 Thread Mark Kirkwood

Tom Lane wrote:

Simon Riggs <[EMAIL PROTECTED]> writes:


On Fri, 2005-12-16 at 23:28 -0500, Bruce Momjian wrote:


How are star joins different from what we do now?




Methods:
1. join all N small tables together in a cartesian product, then join to
main Large table once (rather than N times)



Of course, the reason the current planner does not think of this is that
it does not consider clauseless joins unless there is no alternative.

However, I submit that it wouldn't pick such a plan anyway, and should
not, because the idea is utterly stupid.  The plan you currently get for
this sort of scenario is typically a nest of hash joins:

   QUERY PLAN   


 Hash Join  (cost=2.25..4652.25 rows=102400 width=16)
   Hash Cond: ("outer".f1 = "inner".f1)
   ->  Hash Join  (cost=1.12..3115.12 rows=102400 width=12)
 Hash Cond: ("outer".f2 = "inner".f1)
 ->  Seq Scan on fact  (cost=0.00..1578.00 rows=102400 width=8)
 ->  Hash  (cost=1.10..1.10 rows=10 width=4)
   ->  Seq Scan on d2  (cost=0.00..1.10 rows=10 width=4)
   ->  Hash  (cost=1.10..1.10 rows=10 width=4)
 ->  Seq Scan on d1  (cost=0.00..1.10 rows=10 width=4)
(9 rows)

This involves only one scan of the fact table.  As each row is pulled up
through the nest of hash joins, we hash one dimension key and join to
one small table at each level.  This is at worst the same amount of work
as hashing all the keys at once and probing a single cartesian-product
hashtable, probably less work (fewer wasted key-comparisons).  And
definitely less memory.  You do have to keep your eye on the ball that
you don't waste a lot of overhead propagating the row up through
multiple join levels, but we got rid of most of the problem there in
8.1 via the introduction of "virtual tuple slots".  If this isn't fast
enough yet, it'd make more sense to invest effort in further cutting the
executor's join overhead (which of course benefits *all* plan types)
than in trying to make the planner choose a star join.



2. transform joins into subselects, then return subselect rows via an
index bitmap. Joins are performed via a bitmap addition process.



This one might be interesting but it's not clear what you are talking
about.  "Bitmap addition"?


Yeah - the quoted method of "make a cartesian product of the dimensions 
and then join to the fact all at once" is not actually used (as written) 
in many implementations - probably for the reasons you are pointing out. 
I found these two papers whilst browsing:



http://www.cs.brown.edu/courses/cs227/Papers/Indexing/O'NeilGraefe.pdf
http://www.dama.upc.edu/downloads/jaguilar-2005-4.pdf


They seem to be describing a more subtle method making use of join 
indexes and bitmapped indexes.


If I understand it correctly, the idea is to successively build up a 
list (hash / bitmap) of fact RIDS that will satisfy the query, and when 
complete actually perform the join and construct tuples. The goal being 
 similar in intent to the star join method (i.e. access the fact table 
as little and as "late" as possible), but avoiding the cost of actually 
constructing the dimension cartesian product.


cheers

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex

2005-12-17 Thread Mark Kirkwood

Tom Lane wrote:


2. transform joins into subselects, then return subselect rows via an
index bitmap. Joins are performed via a bitmap addition process.


Looks like 8.1 pretty much does this right now:

First the basic star:

EXPLAIN ANALYZE
SELECT
d0.dmth,
d1.dat,
count(f.fval )
FROM
dim0 AS d0,
dim1 AS d1,
fact0 AS f
WHERE   d0.d0key = f.d0key
AND d1.d1key = f.d1key
AND d0.dyr BETWEEN 2010 AND 2015
AND d1.dattyp BETWEEN '10th measure type' AND '14th measure type'
GROUP BY
d0.dmth,
d1.dat
;

QUERY PLAN 



 HashAggregate  (cost=334842.41..334846.53 rows=329 width=37) (actual 
time=144317.960..144318.814 rows=120 loops=1)
   ->  Hash Join  (cost=145.72..334636.91 rows=27400 width=37) (actual 
time=1586.363..142831.025 rows=201600 loops=1)

 Hash Cond: ("outer".d0key = "inner".d0key)
 ->  Hash Join  (cost=89.72..333279.41 rows=137001 width=37) 
(actual time=1467.322..135585.317 rows=100 loops=1)

   Hash Cond: ("outer".d1key = "inner".d1key)
   ->  Seq Scan on fact0 f  (cost=0.00..281819.45 
rows=1045 width=12) (actual time=120.881..70364.473 rows=1000 
loops=1)
   ->  Hash  (cost=89.38..89.38 rows=137 width=33) (actual 
time=24.822..24.822 rows=660 loops=1)
 ->  Index Scan using dim1_dattyp on dim1 d1 
(cost=0.00..89.38 rows=137 width=33) (actual time=0.502..19.374 rows=660 
loops=1)
   Index Cond: (((dattyp)::text >= '10th 
measure type'::text) AND ((dattyp)::text <= '14th measure type'::text))
 ->  Hash  (cost=51.00..51.00 rows=2000 width=8) (actual 
time=31.620..31.620 rows=2016 loops=1)
   ->  Index Scan using dim0_dyr on dim0 d0 
(cost=0.00..51.00 rows=2000 width=8) (actual time=0.379..17.377 
rows=2016 loops=1)

 Index Cond: ((dyr >= 2010) AND (dyr <= 2015))
 Total runtime: 144320.588 ms
(13 rows)


Now apply the star transformation:

EXPLAIN ANALYZE
SELECT
d0.dmth,
d1.dat,
count(f.fval )
FROM
dim0 AS d0,
dim1 AS d1,
fact0 AS f
WHERE   d0.d0key = f.d0key
AND d1.d1key = f.d1key
AND d0.dyr BETWEEN 2010 AND 2015
AND d1.dattyp BETWEEN '10th measure type' AND '14th measure type'
AND f.d0key IN (SELECT cd0.d0key FROM dim0 cd0
WHERE cd0.dyr BETWEEN 2010 AND 2015)
AND f.d1key IN (SELECT cd1.d1key FROM dim1 cd1
WHERE cd1.dattyp BETWEEN '10th measure type'
 AND '14th measure type')
GROUP BY
d0.dmth,
d1.dat
;

   QUERY PLAN 


--
 HashAggregate  (cost=129230.89..129231.83 rows=75 width=37) (actual 
time=39798.192..39799.015 rows=120 loops=1)
   ->  Nested Loop IN Join  (cost=149.44..129230.33 rows=75 width=37) 
(actual time=269.919..38125.520 rows=201600 loops=1)
 ->  Hash Join  (cost=147.43..128171.03 rows=375 width=45) 
(actual time=269.516..27342.866 rows=201600 loops=1)

   Hash Cond: ("outer".d0key = "inner".d0key)
   ->  Nested Loop  (cost=91.43..128096.03 rows=2000 
width=37) (actual time=152.084..19869.365 rows=100 loops=1)
 ->  Hash Join  (cost=91.43..181.52 rows=2 
width=37) (actual time=29.931..46.339 rows=660 loops=1)

   Hash Cond: ("outer".d1key = "inner".d1key)
   ->  Index Scan using dim1_dattyp on dim1 d1 
 (cost=0.00..89.38 rows=137 width=33) (actual time=0.516..7.683 
rows=660 loops=1)
 Index Cond: (((dattyp)::text >= '10th 
measure type'::text) AND ((dattyp)::text <= '14th measure type'::text))
   ->  Hash  (cost=91.09..91.09 rows=137 
width=4) (actual time=29.238..29.238 rows=660 loops=1)
 ->  HashAggregate  (cost=89.72..91.09 
rows=137 width=4) (actual time=20.940..24.900 rows=660 loops=1)
   ->  Index Scan using dim1_dattyp 
on dim1 cd1  (cost=0.00..89.38 rows=137 width=4) (actual 
time=0.042..14.841 rows=660 loops=1)
 Index Cond: 
(((dattyp)::text >= '10th measure type'::text) AND ((dattyp)::text <= 
'14th measure type'::text))
 ->  Index Scan using fact0_d1key on fact0 f 
(cost=0.00..62707.26 rows=10 width=12) (actual time=0.205..12.691 
rows=1515 loops=660)

   Index Cond: ("outer".d1key = f.d1key)
   ->  Hash  (cost=51.00..51.00 rows=2000 width=8) (actual 
time=31.264..31.264 rows=2016 loops=1)
 ->  Index Scan using dim0_dyr on dim0 d0 
(cost=0.00..51

Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex

2005-12-18 Thread Mark Kirkwood

Simon Riggs wrote:

On Sun, 2005-12-18 at 17:07 +1300, Mark Kirkwood wrote:


Tom Lane wrote:



2. transform joins into subselects, then return subselect rows via an
index bitmap. Joins are performed via a bitmap addition process.


Looks like 8.1 pretty much does this right now:



Good analysis.

8.1 doesn't do:
- the transforms sufficiently well (you just performed them manually)


Absolutely - I was intending to note that very point, but it got lost 
somewhere between brain and fingers :-)



- doesn't AND together multiple bitmaps to assist with N-way joins



Ah yes - I had overlooked that, good point!

Cheers

Mark



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex

2005-12-18 Thread Mark Kirkwood

Simon Riggs wrote:

On Sun, 2005-12-18 at 15:02 +1300, Mark Kirkwood wrote:


Yeah - the quoted method of "make a cartesian product of the dimensions 
and then join to the fact all at once" is not actually used (as written) 
in many implementations 



But it is used in some, which is why I mentioned it.

I gave two implementations, that is just (1)




Sorry Simon, didn't mean to imply you shouldn't have mentioned it - was 
merely opining about its effectiveness


- probably for the reasons you are pointing out. 
I found these two papers whilst browsing:



http://www.cs.brown.edu/courses/cs227/Papers/Indexing/O'NeilGraefe.pdf
http://www.dama.upc.edu/downloads/jaguilar-2005-4.pdf


They seem to be describing a more subtle method making use of join 
indexes and bitmapped indexes.



Which is the option (2) I described.



Ok - I misunderstood you on this one, and thought you were describing 
the "star transformation" - upon re-reading, I see that yes, it's more 
or less a description of the O'Neil Graefe method.


best wishes

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Should Oracle outperform PostgreSQL on a complex

2005-12-18 Thread Mark Kirkwood

Simon Riggs wrote:

On Sat, 2005-12-17 at 13:13 -0500, Tom Lane wrote:


Simon Riggs <[EMAIL PROTECTED]> writes:


On Fri, 2005-12-16 at 23:28 -0500, Bruce Momjian wrote:


How are star joins different from what we do now?



Methods:
1. join all N small tables together in a cartesian product, then join to
main Large table once (rather than N times)


Of course, the reason the current planner does not think of this is that
it does not consider clauseless joins unless there is no alternative.



Understood



The plan you currently get for
this sort of scenario is typically a nest of hash joins:

  QUERY PLAN   


Hash Join  (cost=2.25..4652.25 rows=102400 width=16)
  Hash Cond: ("outer".f1 = "inner".f1)
  ->  Hash Join  (cost=1.12..3115.12 rows=102400 width=12)
Hash Cond: ("outer".f2 = "inner".f1)
->  Seq Scan on fact  (cost=0.00..1578.00 rows=102400 width=8)
->  Hash  (cost=1.10..1.10 rows=10 width=4)
  ->  Seq Scan on d2  (cost=0.00..1.10 rows=10 width=4)
  ->  Hash  (cost=1.10..1.10 rows=10 width=4)
->  Seq Scan on d1  (cost=0.00..1.10 rows=10 width=4)
(9 rows)

This involves only one scan of the fact table.  As each row is pulled up
through the nest of hash joins, we hash one dimension key and join to
one small table at each level.  



Understood



This is at worst the same amount of work
as hashing all the keys at once and probing a single cartesian-product
hashtable, probably less work (fewer wasted key-comparisons).  And
definitely less memory.  You do have to keep your eye on the ball that
you don't waste a lot of overhead propagating the row up through
multiple join levels, but we got rid of most of the problem there in
8.1 via the introduction of "virtual tuple slots".  If this isn't fast
enough yet, it'd make more sense to invest effort in further cutting the
executor's join overhead (which of course benefits *all* plan types)
than in trying to make the planner choose a star join.



That join type is used when an index-organised table is available, so
that a SeqScan of the larger table can be avoided.

I'd say the plan would make sense if the columns of the cartesian
product match a multi-column index on the larger table that would not
ever be used unless sufficient columns are restricted in each lookup.
That way you are able to avoid the SeqScan that occurs for the multiple
nested Hash Join case. (Clearly, normal selectivity rules apply on the
use of the index in this way).

So I think that plan type still can be effective in some circumstances.
Mind you: building an N-way index on a large table isn't such a good
idea, unless you can partition the tables and still use a join. Which is
why I've not centred on this case as being important before now.

My understanding: Teradata and DB2 use this.



FWIW - I think DB2 uses the successive fact RID buildup (i.e method 2), 
unfortunately I haven't got a working copy of DB2 in front of me to test.


Cheers

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Simple Join

2005-12-20 Thread Mark Kirkwood

Mark Kirkwood wrote:

Kevin Brown wrote:


I'll just start by warning that I'm new-ish to postgresql.

I'm running 8.1 installed from source on a Debian Sarge server.  I 
have a simple query that I believe I've placed the indexes correctly 
for, and I still end up with a seq scan.  It makes sense, kinda, but 
it should be able to use the index to gather the right values.  I do 
have a production set of data inserted into the tables, so this is 
running realistically:


dli=# explain analyze SELECT ordered_products.product_id
dli-# FROM to_ship, ordered_products
dli-# WHERE to_ship.ordered_product_id = ordered_products.id AND
dli-# ordered_products.paid = TRUE AND
dli-# ordered_products.suspended_sub = FALSE;



You scan 60 rows from to_ship to get about 25000 - so some way to 
cut this down would help.


Try out an explicit INNER JOIN which includes the filter info for paid 
and suspended_sub in the join condition (you may need indexes on each of 
id, paid and suspended_sub, so that the 8.1 optimizer can use a bitmap 
scan):



SELECT ordered_products.product_id
FROM to_ship INNER JOIN ordered_products
ON (to_ship.ordered_product_id = ordered_products.id
AND ordered_products.paid = TRUE  AND 
ordered_products.suspended_sub = FALSE);



It has been a quiet day today, so I took another look at this. If the 
selectivity of clauses :


paid = TRUE
suspended_sub = FALSE

is fairly high, then rewriting as a subquery might help:

SELECT o.product_id
FROM ordered_products o
WHERE o.paid = TRUE
AND o.suspended_sub = FALSE
AND  EXISTS (
  SELECT 1
  FROM to_ship s
  WHERE s.ordered_product_id = o.id
);


However it depends on you not needing anything from to_ship in the 
SELECT list...


Cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-23 Thread Mark Kirkwood

Vivek Khera wrote:

and only the 
opteron boxes needed to come from sun. add a zero return  policy and you 
wonder how they expect to keep in business


sorry, i had to vent.



Just out of interest - why did the opterons need to come from Sun?





---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: RES: [PERFORM] Priority to a mission critical transaction

2006-11-28 Thread Mark Kirkwood

Josh Berkus wrote:

All,

The Bizgres project is working on resource management for PostgreSQL.  So far, 
however, they have been able to come up with schemes that work for BI/DW at 
the expense of OLTP.  Becuase of O^N lock checking issues, resource 
management for OLTP which doesn't greatly reduce overall performance seems a 
near-impossible task.




Right - I guess it is probably more correct to say that the 
implementation used in Bizgres is specifically targeted at BI/DW 
workloads rather than OLTP.


At this point we have not measured its impact on concurrency in anything 
other than a handwaving manner - e.g pgbench on an older SMP system 
showed what looked like about a 10% hit. However the noise level for 
pgbench is typically >10% so - a better benchmark on better hardware is 
 needed.


Cheers

Mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] RES: Priority to a mission critical transaction

2006-11-28 Thread Mark Kirkwood

Ron Mayer wrote:

Short summary:
  * Papers studying priority inversion issues with
databases including PosgreSQL and realistic workloads
conclude setpriority() helps even in the presence of
priority inversion issues for TCP-C and TCP-W like
workloads.
  * Avoiding priority inversion with priority inheritance
will further help some workloads (TCP-C) more than
others (TCP-W) but even without such schedulers
priority inversion does not cause as much harm
as the benefit you get from indirectly scheduling
I/O through setpriority() in any paper I've seen.

Andreas Kostyrka wrote:

* Carlos H. Reimer <[EMAIL PROTECTED]> [061128 20:02]:

Will the setpriority() system call affect i/o queue too?

Nope, and in fact the article shows the way not to do it.


Actually *YES* setpriority() does have an indirect effect
on the I/O queue.



While I was at Greenplum a related point was made to me:

For a TPC-H/BI type workload on a well configured box the IO subsystem 
can be fast enough so that CPU is the bottleneck for much of the time - 
so being able to use setpriority() as a resource controller makes sense.


Also, with such a workload being mainly SELECT type queries, the dangers 
connected with priority inversion are considerably reduced.


Cheers

Mark

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Bad iostat numbers

2006-11-30 Thread Mark Kirkwood

Carlos H. Reimer wrote:
While collecting performance data I discovered very bad numbers in the 
I/O subsystem and I would like to know if I´m thinking correctly.
 
Here is a typical iostat -x:
 


avg-cpu:  %user   %nice %system %iowait   %idle

  50.400.000.501.10   48.00

 

Device:rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util


sda  0.00   7.80  0.40  6.40   41.60  113.6020.80
56.8022.82 570697.50   10.59 147.06 100.00


sdb  0.20   7.80  0.60  6.40   40.00  113.6020.00
56.8021.94 570697.509.83 142.86 100.00


md1  0.00   0.00  1.20 13.40   81.60  107.2040.80
53.6012.93 0.000.00   0.00   0.00


md0  0.00   0.00  0.00  0.000.000.00 0.00 
0.00 0.00 0.000.00   0.00   0.00


 


Are they not saturated?



They look it (if I'm reading your typical numbers correctly) - %util 100 
and svctime in the region of 100 ms!


On the face of it, looks like you need something better than a RAID1 
setup - probably RAID10 (RAID5 is probably no good as you are writing 
more than you are reading it seems). However read on...


If this is a sudden change in system behavior, then it is probably worth 
trying to figure out what is causing it (i.e which queries) - for 
instance it might be that you have some new queries that are doing disk 
based sorts (this would mean you really need more memory rather than 
better disk...)


Cheers

Mark



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Bad iostat numbers

2006-11-30 Thread Mark Kirkwood

David Boreham wrote:



These number look a bit strange. I am wondering if there is a hardware 
problem on one of the drives
or on the controller. Check in syslog for messages about disk timeouts 
etc. 100% util but 6 writes/s

is just wrong (unless the drive is a 1980's vintage floppy).



Agreed - good call, I was misreading the wkB/s as wMB/s...

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] SQL_CALC_FOUND_ROWS in POSTGRESQL / Some one can

2006-12-10 Thread Mark Kirkwood

Chris wrote:

It's the same as doing a select count(*) type query using the same 
clauses, but all in one query instead of two.


It doesn't return any extra rows on top of the limit query so it's 
better than using pg_numrows which runs the whole query and returns it 
to php (in this example).



Their docs explain it:

http://dev.mysql.com/doc/refman/4.1/en/information-functions.html

See "FOUND_ROWS()"



Note that from the same page:

"If you are using SELECT SQL_CALC_FOUND_ROWS, MySQL must calculate how 
many rows are in the full result set. However, this is faster than 
running the query again without LIMIT, because the result set need not 
be sent to the client."


So it is not as cost-free as it would seem - the CALC step is 
essentially doing "SELECT count(*) FROM (your-query)" in addition to 
your-query-with-the-limit.


I don't buy the "its cheap 'cause nothing is returned to the client" 
bit, because 'SELECT count(*) ...' returns only 1 tuple of 1 element to 
the client anyway. On the face of it, it *looks* like you save an extra 
set of parse, execute, construct (trivially small) resultset calls - but 
'SELECT FOUND_ROWS()' involves that set of steps too, so I'm not 
entirely convinced that doing a 2nd 'SELECT count(*)...' is really any 
different in impact.


Cheers

Mark





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Mark Kirkwood

Daniel van Ham Colchete wrote:

On 12/11/06, Steinar H. Gunderson <[EMAIL PROTECTED]> wrote:

On Mon, Dec 11, 2006 at 11:31:48AM -0200, Daniel van Ham Colchete wrote:
> What PostgreSQL benchmark software should I use???

Look up the list archives; search for "TPC".

> I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
> the same version FC6 uses and install it at my Gentoo. I'll use the
> same hardware (diferent partitions to each).

Why do you want to compare FC6 and Gentoo? Wasn't your point that the 
-march=
was supposed to be the relevant factor here? In that case, you want to 
keep

all other things equal; so use the same distribution, only with -O2
-march=i686 vs. -march=athlon-xp (or whatever).

/* Steinar */

Using Gentoo is just a easy way to make cflag optimizations to all the
other libs as well: glibc, ...

I can also mesure performance on Gentoo with cflag optimized
PostgreSQL and plain PostgreSQL as well.



I can certainly recall that when I switched from Fedora Core (2 or 3 
can't recall now) to Gentoo that the machine was "faster" for many 
activities. Of course I can't recall precisely what now :-(.


To actually track down *why* and *what* make it faster is another story, 
and custom CFLAGS is only 1 of the possible factors: others could be:


- different kernel versions (Gentoo would have possibly been later)
- different kernel patches (both RedHat and Gentoo patch 'em)
- different versions of glibc (Gentoo possibly later again).
- different config options for glibc (not sure if they in fact are, but 
it's possible...)
- kernel and glibc built with different versions of gcc (again I suspect 
Gentoo may have used a later version)


So there are a lot of variables to consider if you want to settle this 
debate once and for all :-)!


Best wishes

Mark

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] EXPLAIN ANALYZE on 8.2

2006-12-14 Thread Mark Kirkwood

Kelly Burkhart wrote:


I hope this isn't a "crummy mainboard" but I can't seem to affect
things by changing clock source (kernel 2.6.16 SLES10).  I tried
kernel command option clock=XXX where XXX in
(cyclone,hpet,pmtmr,tsc,pit), no option was significantly better than
the default.

Anyone know how this might be improved (short of replacing hardware)?



Updating the BIOS might be worth investigating, and then bugging your 
Linux distro mailing list/support etc for more help. (What sort of 
motherboard is it?)


Cheers

Mark

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] slow result

2007-01-23 Thread Mark Kirkwood

Laurent Manchon wrote:

Hi,

I have a slow response of my PostgreSQL database 7.4 using this query below
on a table with 80 rows:

select count(*)from tbl;

PostgreSQL return result in 28 sec every time.



Can you post the results of:

analyze verbose tbl;
explain analyze select count(*) from tbl;

The first will give us some info about how many pages tbl has (in 7.4 
ISTR it does not state the # of dead rows... but anyway), the second 
should help us deduce why it is so slow.


Also as others have pointed out, later versions are quite a bit faster 
for sequential scans...


Cheers

Mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] extract(field from timestamp) vs date dimension

2007-01-23 Thread Mark Kirkwood

Tobias Brox wrote:



I suppose the strongest argument for introducing date dimensions already
now is that I probably will benefit from having conform and
well-designed dimensions when I will be introducing more data marts.  As
for now I have only one fact table and some few dimensions in the
system.



Another factors to consider is that end user tools (and end users) may 
find a date/time dimension helpful.


Best wishes

Mark

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Mark Kirkwood

Karen Hill wrote:



The postgres version is 8.2.1 on Windows.   The pl/pgsql function is
inserting to an updatable view (basically two tables).

CREATE TABLE foo1
(


) ;

CREATE TABLE foo2
(

);

CREATE VIEW viewfoo AS
(

);
CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD
(

);

CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
BEGIN
FOR i in 1..20 LOOP
INSERT INTO viewfoo (x) VALUES (x);
END LOOP;
END;
$$ LANGUAGE plpgsql;




Sorry - but we probably need *still* more detail! - the definition of 
viewfoo is likely to be critical. For instance a simplified variant of 
your setup does 20 inserts in 5s on my PIII tualatin machine:


CREATE TABLE foo1 (x INTEGER);

CREATE VIEW viewfoo AS SELECT * FROM foo1;

CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD
(
 INSERT INTO foo1 VALUES (new.x);
)

CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
BEGIN
FOR i in 1..20 LOOP
INSERT INTO viewfoo (x) VALUES (i);
END LOOP;
END;
$$ LANGUAGE plpgsql;


postgres=# \timing
postgres=# SELECT functionFoo() ;
 functionfoo
-

(1 row)

Time: 4659.477 ms

postgres=# SELECT count(*) FROM viewfoo;
 count

 20
(1 row)

Cheers

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] quad or dual core Intel CPUs

2007-02-13 Thread Mark Kirkwood

Kenji Morishige wrote:


Please comment on any issues you may see with this box and my assumptions.
Also any FreeBSD kernel issues or tweaks you could recommend.



I would recommend posting to freebsd-hardware or freebsd-stable and 
asking if there are any gotchas with the X7DBE+ and 6.2 (for instance 
X7DBR-8+ suffers from an intermittent hang at boot... so can't hurt to ask!)


Cheers

Mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Postgres performance Linux vs FreeBSD

2007-02-21 Thread Mark Kirkwood

Jacek Zarêba wrote:

Hello, I've set up 2 identical machines, hp server 1ghz p3,
768mb ram, 18gb scsi3 drive. On the first one I've installed
Debian/GNU 4.0 Linux, on the second FreeBSD 6.2. On both
machines I've installed Postgresql 8.2.3 from sources.
Now the point :)) According to my tests postgres on Linux
box run much faster then on FreeBSD, here are my results:



With respect to 'select count(*) from ...' being slower on FreeBSD, 
there are a number of things to try to make FreeBSD faster for this sort 
of query. Two I'm currently using are:


- setting sysctl vfs.read_max to 16 or 32
- rebuilding the relevant filesystem with 32K blocks and 4K frags

I have two (almost) identical systems - one running Gentoo, one running 
FreeBSD 6.2. With the indicated changes the FreeBSD system performs 
pretty much the same as the Gentoo one.


With respect to the 'explain analyze' times, FreeBSD has a more accurate 
and more expensive gettimeofday call - which hammers its 'explain 
analyze' times compared to Linux.


Cheers

Mark


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] increasing database connections

2007-02-28 Thread Mark Kirkwood

Joshua D. Drake wrote:

Jonah H. Harris wrote:

On 3/1/07, Shiva Sarna <[EMAIL PROTECTED]> wrote:

I am sorry if it is a repeat question but I want to know if database
performance will decrease if I increase the max-connections to 2000. At
present it is 100.

Most certainly.  Adding connections over 200 will degrade performance
dramatically.  You should look into pgpool or connection pooling from
the application.


huh? That is certainly not my experience. I have systems that show no
depreciable performance hit on even 1000+ connections. To be fair to the
discussion, these are on systems with 4+ cores. Usually 8+ and
significant ram 16/32 gig fo ram.



Yeah - I thought that somewhere closer to 1 connections is where you 
get hit with socket management related performance issues.


Cheers

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] SCSI vs SATA

2007-04-06 Thread Mark Kirkwood

Ron wrote:
I read them as soon as they were available.  Then I shrugged and noted 
YMMV to myself.



1= Those studies are valid for =those= users under =those= users' 
circumstances in =those= users' environments.

 How well do those circumstances and environments mimic anyone else's?


Exactly, understanding whether the studies are applicable to you is the 
critical step - before acting on their conclusions! Thanks Ron, for the 
thoughtful analysis on this topic!


Cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] index structure for 114-dimension vector

2007-04-20 Thread Mark Kirkwood

Jeff Davis wrote:

On Fri, 2007-04-20 at 12:07 -0700, Andrew Lazarus wrote:

I have a table with 2.5 million real[] arrays. (They are points in a
time series.) Given a new array X, I'd like to find, say, the 25
closest to X in some sense--for simplification, let's just say in the
usual vector norm. Speed is critical here, and everything I have tried
has been too slow.

I imported the cube contrib package, and I tried creating an index on
a cube of the last 6 elements, which are the most important. Then I


Have you tried just making normal indexes on each of the last 6 elements
and see if postgresql will use a BitmapAnd to combine them? 





I don't think that will work for the vector norm i.e:

|x - y| = sqrt(sum over j ((x[j] - y[j])^2))


Cheers

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] index structure for 114-dimension vector

2007-04-20 Thread Mark Kirkwood

Andrew Lazarus wrote:

Because I know the 25 closest are going to be fairly close in each
coordinate, I did try a multicolumn index on the last 6 columns and
used a +/- 0.1 or 0.2 tolerance on each. (The 25 best are very probably inside
that hypercube on the distribution of data in question.)

This hypercube tended to have 10-20K records, and took at least 4
seconds to retrieve. I was a little surprised by how long that took.
So I'm wondering if my data representation is off the wall.

I should mention I also tried a cube index using gist on all 114
elements, but CREATE INDEX hadn't finished in 36 hours, when I killed
it, and I wasn't in retrospect sure an index that took something like
6GB by itself would be helpful on a 2GB of RAM box.

MK> I don't think that will work for the vector norm i.e:

MK> |x - y| = sqrt(sum over j ((x[j] - y[j])^2))




Sorry, in that case it probably *is* worth trying out 6 single column 
indexes and seeing if they get bitmap and'ed together...


Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] postgres: 100% CPU utilization

2007-04-23 Thread Mark Kirkwood

Scott Marlowe wrote:


(snippage) that's the kinda hardware I was running 7.0.2 on, so you
might as well get retro in your hardware department while you're at it.



Notice he's running FreeBSD 4.4(!), so it could well be a very old 
machine...


Cheers

Mark


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] postgres: 100% CPU utilization

2007-04-26 Thread Mark Kirkwood

Sergey Tsukinovsky wrote:


Just for the record - the hardware that was used for the test has the
following parameters:
AMD Opteron 2GHZ
2GB RAM
LSI Logic SCSI



And you ran FreeBSD 4.4 on it right? This may be a source of high cpu 
utilization in itself if the box is SMP or dual core, as multi-cpu 
support was pretty primitive in that release (4.12 would be better if 
you are required to stick to the 4.x branch, if not the 6.2 is recommended)!


Cheers

Mark



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning

2007-05-05 Thread Mark Kirkwood

Josh Berkus wrote:

Sebastian,


Before inventing a hyper tool, we might consider to provide 3-5 example
szenarios for common hardware configurations. This consumes less time
and be discussed and defined in a couple of days. This is of course not
the correct option for a brandnew 20 spindle Sata 10.000 Raid 10 system
but these are probably not the target for default configurations.


That's been suggested a number of times, but some GUCs are really tied to the 
*exact* amount of RAM you have available.  So I've never seen how "example 
configurations" could help.




I'm not convinced about this objection - having samples gives a bit of a 
heads up on *what* knobs you should at least look at changing.


Also it might be helpful on the -general or -perf lists to be able to 
say "try config 3 (or whatever we call 'em) and see what changes..."


I've certainly found the sample config files supplied with that database 
whose name begins with 'M' a useful *start* when I want something better 
than default...


Cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Disk Fills Up and fsck "Compresses" it

2007-05-16 Thread Mark Kirkwood

Jim C. Nasby wrote:

No, it's part of FreeBSD's UFS. google FreeBSD softupdates and you
should get plenty of info.

As I said, it's probably not worth worrying about.

On Wed, May 16, 2007 at 08:21:23AM -0700, Y Sidhu wrote:

What do you mean by "softupdates?" Is that a parameter in what I am guessing
is the conf file?


Here is quite a good article on the interaction of fsck and softupdates 
in FreeBSD:


http://www.usenix.org/publications/library/proceedings/bsdcon02/mckusick/mckusick_html/index.html

Having said that, it seems to talk about space lost by unreferenced 
blocks and inodes is the context of panic or power outage, as opposed to 
normal softupdate operation (unless I'm missing something...)


Cheers

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Weird 8.2.4 performance

2007-06-06 Thread Mark Kirkwood

Kurt Overberg wrote:


Explain Outputs:

-- 8.2




   ->  Bitmap Heap Scan on taskinstance  (cost=20.71..2143.26 rows=556 
width=8) (actual time=421.423..5655.745 rows=98 loops=9)

 Recheck Cond: (taskinstance.taskid = task.id)
 ->  Bitmap Index Scan on taskid_taskinstance_key  
(cost=0.00..20.57 rows=556 width=0) (actual time=54.709..54.709 rows=196 
loops=9)



-- 8.0



   ->  Index Scan using taskid_taskinstance_key on taskinstance  
(cost=0.00..2152.28 rows=563 width=8) (actual time=0.012..0.832 rows=145 
loops=11)



8.2 is deciding to use a bitmap index scan on taskid_taskinstance_key, 
which seems to be slower (!) than a plain old index scan that 8.0 is 
using. A dirty work around is to disable bitmap scans via:


SET enable_bitmapscan=off

but it is probably worthwhile to try to find out *why* the bitmap scan 
is 1) slow and 2) chosen at all given 1).


One thought that comes to mind - is work_mem smaller on your 8.2 system 
than the 8.0 one? (or in fact is it very small on both?). Also it might 
be interesting to see your non-default postgresql.conf settings for both 
systems.


Cheers

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Weird 8.2.4 performance

2007-06-07 Thread Mark Kirkwood

Steinar H. Gunderson wrote:

On Thu, Jun 07, 2007 at 11:35:27AM +0200, Steinar H. Gunderson wrote:

If that table
doesn't fit in the cache on your Mac, you pretty much lose. From the EXPLAIN
output, it looks like it fits very nicely in cache on your server. Thus, I
don't think the difference is between 8.0 and 8.2, but rather your production
server and your test machine.



That's a good point, however its not immediately obvious that the 
production server is *not* running MacOSX Tiger (or has any more 
memory)... Kurt can you post the relevant specs for the the 8.0 and 8.2 
boxes?


Cheers

Mark


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Weird 8.2.4 performance

2007-06-07 Thread Mark Kirkwood

Kurt Overberg wrote:
work_mem = 100MB# when I ran the original query, 
this was set to 1MB, increased on Mark Kirkwood's advice, seemed to help 
a bit but not really




For future reference, be careful with this parameter, as *every* 
connection will use this much memory for each sort or hash (i.e it's not 
shared and can be allocated several times by each connection!)...yeah, I 
know I suggested increasing it to see what effect it would have :-).


And I'd agree with Steiner and others, looks like caching effects are 
the cause of the timing difference between production and the mac!


Cheers

Mark




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] disk filling up

2007-07-26 Thread Mark Kirkwood

Brandon Shalton wrote:

Hello all,

My hard disk is filling up in the /base  directory  to where it has 
consumed all 200gig of that drive.


All the posts that i see keep saying move to a bigger drive, but at 
some point a bigger drive would just get consumed.


How can i keep the disk from filling up other than get like a half TB 
setup just to hold the ./base/*  folder




Two things come to mind:

- Vacuum as already mentioned by others.
- Temporary sort files from queries needing to sort massive amounts of data.

But you need to help us out by supplying more info...

Cheers

Mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] When/if to Reindex

2007-08-23 Thread Mark Kirkwood

Tom Lane wrote:


The fly in the ointment is that after collecting the pg_index definition
of the index, plancat.c also wants to know how big it is --- it calls
RelationGetNumberOfBlocks.  And that absolutely does look at the
physical storage, which means it absolutely is unsafe to do in parallel
with a REINDEX that will be dropping the old physical storage at some
point.

So maybe we are stuck and we have to say "that doesn't work anymore".
But it feels like we might not be too far away from letting it still
work.  Thoughts, ideas?
  


A suggestion that seems a bit like a leap backwards in time - maybe just 
use the pg_class.relpages entry for the index size?


I'm punting that with autovacuum being enabled by default now, the 
relpages entries for all relations will be more representative than they 
used to in previous releases.


Cheers

Mark


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] File system choice for Red Hat systems

2010-06-01 Thread Mark Kirkwood
I'm helping set up a Red Hat 5.5 system for Postgres. I was going to 
recommend xfs for the filesystem - however it seems that xfs is 
supported as a technology preview "layered product" for 5.5. This 
apparently means that the xfs tools are only available via special 
channels.


What are Red Hat using people choosing for a good performing filesystem?

regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] File system choice for Red Hat systems

2010-06-01 Thread Mark Kirkwood

On 02/06/10 15:26, Tom Lane wrote:


What's your time horizon?  RHEL6 will have full support for xfs.
On RHEL5 I really wouldn't consider anything except ext3.

   
Yeah, RHEL6 seems like the version we would prefer - unfortunately time 
frame is the next few days. Awesome - thanks for the quick reply!


regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] File system choice for Red Hat systems

2010-06-01 Thread Mark Kirkwood

On 02/06/10 17:17, Devrim GÜNDÜZ wrote:


For xfs, you may want to read this:

http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html


   


Thanks - yes RHEL6 is the version we would have liked to use I suspect!

Regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] File system choice for Red Hat systems

2010-06-02 Thread Mark Kirkwood

On 03/06/10 02:53, Alan Hodgson wrote:

On Tuesday 01 June 2010, Mark Kirkwood
wrote:
   

I'm helping set up a Red Hat 5.5 system for Postgres. I was going to
recommend xfs for the filesystem - however it seems that xfs is
supported as a technology preview "layered product" for 5.5. This
apparently means that the xfs tools are only available via special
channels.

What are Red Hat using people choosing for a good performing filesystem?

 

I've run PostgreSQL on XFS on CentOS for years. It works well. Make sure you
have a good battery-backed RAID controller under it (true for all
filesystems).

   


Thanks - yes, left to myself I would consider using Centos instead. 
However os choice is prescribed in this case I believe.


Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-02 Thread Mark Kirkwood

On 03/06/10 11:30, Craig James wrote:
I'm testing/tuning a new midsize server and ran into an inexplicable 
problem.  With an RAID10 drive, when I move the WAL to a separate 
RAID1 drive, TPS drops from over 1200 to less than 90!   I've checked 
everything and can't find a reason.





Are the 2 new RAID1 disks the same make and model as the 12 RAID10 ones?

Also, are barriers *on* on the RAID1 mount and off on the RAID10 one?

Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-16 Thread Mark Kirkwood

On 16/06/10 18:30, jgard...@jonathangardner.net wrote:

On Jun 15, 4:18 pm, j...@agliodbs.com (Josh Berkus) wrote:
   

On 6/15/10 10:37 AM, Chris Browne wrote:

I'd like to see some figures about WAL on RAMfs vs. simply turning off
fsync and full_page_writes.  Per Gavin's tests, PostgreSQL is already
close to TokyoCabinet/MongoDB performance just with those turned off; I
wonder if actually having the WAL on a memory partition would make any
real difference in throughput.

I've seen a lot of call for this recently, especially since PostgreSQL
seems to be increasingly in use as a reporting server for Hadoop.  Might
be worth experimenting with just making wal writing a no-op.  We'd also
want to disable checkpointing, of course.

 

My back-of-the-envelope experiment: Inserting single integers into a
table without indexes using a prepared query via psycopg2.

Python Script:
import psycopg2
from time import time
conn = psycopg2.connect(database='jgardner')
cursor = conn.cursor()
cursor.execute("CREATE TABLE test (data int not null)")
conn.commit()
cursor.execute("PREPARE ins AS INSERT INTO test VALUES ($1)")
conn.commit()
start = time()
tx = 0
while time() - start<  1.0:
 cursor.execute("EXECUTE ins(%s)", (tx,));
 conn.commit()
 tx += 1
print tx
cursor.execute("DROP TABLE test");
conn.commit();

Local disk, WAL on same FS:
* Default config =>  90
* full_page_writes=off =>  90
* synchronous_commit=off =>  4,500
* fsync=off =>  5,100
* fsync=off and synchronous_commit=off =>  5,500
* fsync=off and full_page_writes=off =>  5,150
* fsync=off, synchronous_commit=off and full_page_writes=off =>  5,500

tmpfs, WAL on same tmpfs:
* Default config: 5,200
* full_page_writes=off =>  5,200
* fsync=off =>  5,250
* synchronous_commit=off =>  5,200
* fsync=off and synchronous_commit=off =>  5,450
* fsync=off and full_page_writes=off =>  5,250
* fsync=off, synchronous_commit=off and full_page_writes=off =>  5,500

NOTE: If I do one giant commit instead of lots of littler ones, I get
much better speeds for the slower cases, but I never exceed 5,500
which appears to be some kind of wall I can't break through.

If there's anything else I should tinker with, I'm all ears.

   


Seeing some profiler output (e.g oprofile) for the fastest case (and 
maybe 'em all later) might be informative about what limit is being hit 
here.


regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] wal_synch_method = open_sync safe on RHEL 5.5?

2010-06-17 Thread Mark Kirkwood
Some more on the RHEL 5.5 system I'm helping to setup. Some benchmarking 
using pgbench appeared to suggest that wal_sync_method=open_sync was a 
little faster than fdatasync [1]. Now I recall some discussion about 
this enabling direct io and the general flakiness of this on Linux, so 
is the option regarded as safe?


[1] The workout:

$ pgbench -i -s 1000 bench
$ pgbench -c [1,2,4,8,32,64,128] -t 1

Performance peaked around 2500 tps @32 clients using open_sync and 2200 
with fdatasync. However the disk arrays are on a SAN and I suspect that 
when testing with fdatasync later in the day there may have been 
workload 'leakage' from other hosts hitting the SAN.


Re: [PERFORM] wal_synch_method = open_sync safe on RHEL 5.5?

2010-06-17 Thread Mark Kirkwood

On 18/06/10 15:29, Greg Smith wrote:


P.S. Be wary of expecting pgbench to give you useful numbers on a 
single run.  For the default write-heavy test, I recommend three runs 
of 10 minutes each (-T 600 on recent PostgreSQL versions) before I 
trust any results it gives.  You can get useful data from the 
select-only test in only a few seconds, but not the one that writes a 
bunch.




Yeah, I did several runs of each, and a couple with -c 128 and -t 10 
to give the setup a good workout (also 2000-2400 tps, nice to see a well 
behaved SAN).



Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [Slony1-general] WAL partition overloaded--by autovacuum?

2010-07-07 Thread Mark Kirkwood

On 07/07/10 13:10, Richard Yen wrote:


This leads me to believe that there was a sudden flurry of write activity that 
occurred, and the process that would flush WAL files to /db/data/ couldn't keep 
up, thereby filling up the disk.  I'm wondering if anyone else out there might 
be able to give me some insight or comments to my assessment--is it accurate?  
Any input would be helpful, and I'll try to make necessary architectural 
changes to keep this from happening again.



Do you have wal archiving enabled? (if so lets see your archive_command).

Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help in performance tuning.

2010-07-09 Thread Mark Kirkwood

On 10/07/10 00:56, Brad Nicholson wrote:

On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote:
   


Perhaps not, but there's no obvious benefit either.  Since there's
More Than One Way To Do It, it seems more practical to keep that as a
separate problem that can be solved by a choice of add-on packages.
 

This sounds similar to the approach to taken with Replication for years
before being moved into core.

Just like replication, pooling has different approaches.  I do think
that in both cases, having a solution that works, easily, out of the
"box" will meet the needs of most users.

There is also the issue of perception/adoption here as well.  One of my
colleagues mentioned that at PG East that he repeatedly heard people
talking (negatively) about the over reliance on add-on packages to deal
with core DB functionality.

   


It would be interesting to know more about what they thought an 'over 
reliance' was and which packages they meant.


While clearly in the case of replication something needed to be done to 
make it better and easier, it is not obvious that the situation with 
connection pools is analogous. For instance we make extensive use of 
PgBouncer, and it seems to do the job fine and is ridiculously easy to 
install and setup. So would having (something like) this in core be an 
improvement? Clearly if the 'in core' product is better then it is 
desirable... similarly if the packaged product is better... well let's 
have that then!


I've certainly observed a 'fear of package installation' on the part of 
some folk, which is often a hangover from the 'Big IT shop' mentality 
where it requires blood signatures and child sacrifice to get anything 
new installed.


regards

Mark

P.s Also note that Database Vendors like pooling integrated in the core 
of *their* product because it is another thing to charge a license for. 
Unfortunately this can also become an entrenched mentality of 'must be 
in core' on the part of consultants etc!


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Linux Filesystems again - Ubuntu this time

2010-07-26 Thread Mark Kirkwood
I'm involved with the setup of replacement hardware for one of our 
systems. It is going to be using Ubuntu Lucid Server (kernel 2.6.32 I 
believe). The issue of filesystems has raised its head again.


I note that ext4 is now the default for Lucid, what do folks think about 
using it: stable enough now? Also xfs has seen quite a bit of 
development in these later kernels, any thoughts on that?


Cheers

Mark

P.s: We are quite keen to move away from ext3, as we have encountered 
its tendency to hit a wall under heavy load and leave us waiting for 
kjournald and pdflush to catch up


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Mark Kirkwood

On 06/08/10 06:28, Kenneth Cox wrote:
I am using PostgreSQL 8.3.7 on a dedicated IBM 3660 with 24GB RAM 
running CentOS 5.4 x86_64.  I have a ServeRAID 8k controller with 6 
SATA 7500RPM disks in RAID 6, and for the OLAP workload it feels* 
slow.  I have 6 more disks to add, and the RAID has to be rebuilt in 
any case, but first I would like to solicit general advice.  I know 
that's little data to go on, and I believe in the scientific method, 
but in this case I don't have the time to make many iterations.


My questions are simple, but in my reading I have not been able to 
find definitive answers:


1) Should I switch to RAID 10 for performance?  I see things like 
"RAID 5 is bad for a DB" and "RAID 5 is slow with <= 6 drives" but I 
see little on RAID 6.  RAID 6 was the original choice for more usable 
space with good redundancy.  My current performance is 85MB/s write, 
151 MB/s reads (using dd of 2xRAM per 
http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm).




Normally I'd agree with the others and recommend RAID10 - but you say 
you have an OLAP workload - if it is *heavily* read biased you may get 
better performance with RAID5 (more effective disks to read from). 
Having said that, your sequential read performance right now is pretty 
low (151 MB/s  - should be double this), which may point to an issue 
with this controller. Unfortunately this *may* be important for an OLAP 
workload (seq scans of big tables).




2) Should I configure the ext3 file system with noatime and/or 
data=writeback or data=ordered?  My controller has a battery, the 
logical drive has write cache enabled (write-back), and the physical 
devices have write cache disabled (write-through).




Probably wise to use noatime. If you have a heavy write workload (i.e so 
what I just wrote above does *not* apply), then you might find adjusting 
the ext3 commit interval upwards from its default of 5 seconds can help 
(I'm doing some testing at the moment and commit=20 seemed to improve 
performance by about 5-10%).


3) Do I just need to spend more time configuring postgresql?  My 
non-default settings were largely generated by pgtune-0.9.3:


max_locks_per_transaction = 128 # manual; avoiding "out of shared 
memory"

default_statistics_target = 100
maintenance_work_mem = 1GB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 16GB
work_mem = 352MB
wal_buffers = 32MB
checkpoint_segments = 64
shared_buffers = 2316MB
max_connections = 32



Possibly higher checkpoint_segments and lower wal_buffers (I recall 
someone - maybe Greg suggesting that there was no benefit in having the 
latter > 10MB). I wonder about setting shared_buffers higher - how large 
is the database?


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Mark Kirkwood

On 06/08/10 11:58, Alan Hodgson wrote:

On Thursday, August 05, 2010, Mark Kirkwood
wrote:
   

Normally I'd agree with the others and recommend RAID10 - but you say
you have an OLAP workload - if it is *heavily* read biased you may get
better performance with RAID5 (more effective disks to read from).
Having said that, your sequential read performance right now is pretty
low (151 MB/s  - should be double this), which may point to an issue
with this controller. Unfortunately this *may* be important for an OLAP
workload (seq scans of big tables).
 

Probably a low (default) readahead limitation. ext3 doesn't help but it can
usually get up over 400MB/sec. Doubt it's the controller.

   


Yeah - good suggestion, so cranking up readahead (man blockdev) and 
retesting is recommended.


Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Mark Kirkwood

On 06/08/10 12:31, Mark Kirkwood wrote:

On 06/08/10 11:58, Alan Hodgson wrote:
On Thursday, August 05, 2010, Mark 
Kirkwood

wrote:

Normally I'd agree with the others and recommend RAID10 - but you say
you have an OLAP workload - if it is *heavily* read biased you may get
better performance with RAID5 (more effective disks to read from).
Having said that, your sequential read performance right now is pretty
low (151 MB/s  - should be double this), which may point to an issue
with this controller. Unfortunately this *may* be important for an OLAP
workload (seq scans of big tables).
Probably a low (default) readahead limitation. ext3 doesn't help but 
it can

usually get up over 400MB/sec. Doubt it's the controller.



Yeah - good suggestion, so cranking up readahead (man blockdev) and 
retesting is recommended.





... sorry, it just occurred to wonder about the stripe or chunk size 
used in the array, as making this too small can also severely hamper 
sequential performance.


Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor performance

2010-08-17 Thread Mark Kirkwood

On 18/08/10 06:19, Kevin Grittner wrote:



Since you haven't set effective_cache_size, you're discouraging some
types of plans which might be worth considering.  This should
normally be set to the sum of your shared_buffers setting and
whatever is cached by the OS; try setting effective_cache_size to
15MB.  Speaking of shared_buffers, are you really at the default for
that, too?  If so, try setting it to somewhere between 1GB and 4GB.
(I would test at 1, 2, and 4 if possible, since the best setting is
dependent on workload.)


   


Kevin - I'm guessing you meant to suggest setting effective_cache_size 
to 15GB (not 15MB)


Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Memory usage - indexes

2010-09-23 Thread Mark Kirkwood

On 24/09/10 09:50, Tobias Brox wrote:

We've come to a tipping point with one of our database servers, it's
generally quite loaded but up until recently it was handling the load
well - but now we're seeing that it struggles to process all the
selects fast enough.  Sometimes we're observing some weird lock-like
behaviour (see my other post on that), but most of the time the
database server is just not capable of handling the load fast enough
(causing the queries to pile up in the pg_stat_activity-view).

My main hypothesis is that all the important indexes would fit snuggly
into the memory before, and now they don't.  We'll eventually get the
server moved over to new and improved hardware, but while waiting for
that to happen we need to do focus on reducing the memory footprint of
the database.  I have some general questions now ...

1) Are there any good ways to verify my hypothesis?  Some months ago I
thought of running some small memory-gobbling program on the database
server just to see how much memory I could remove before we would see
indications of the database being overloaded.  It seems a bit radical,
but I think the information learned from such an experiment would be
very useful ... and we never managed to set up any testing environment
that faithfully replicates production traffic.  Anyway, it's sort of
too late now that we're already observing performance problems even
without the memory gobbling script running.

2) I've seen it discussed earlier on this list ... shared_buffers vs
OS caches.  Some claims that it has very little effect to adjust the
size of the shared buffers.  Anyway, isn't it a risk that memory is
wasted because important data is stored both in the OS cache and the
shared buffers?  What would happen if using almost all the available
memory for shared buffers?  Or turn it down to a bare minimum and let
the OS do almost all the cache handling?

3) We're discussing to drop some overlapping indexes ... i.e. to drop
one out of two indexes looking like this:

some_table(a)
some_table(a,b)

Would the query "select * from some_table where a=?" run slower if we
drop the first index?  Significantly?

(in our situation I found that the number of distinct b's for each a
is low and that the usage stats on the second index is quite low
compared with the first one, so I think we'll drop the second index).

4) We're discussing to drop other indexes.  Does it make sense at all
as long as we're not experiencing problems with inserts/updates?  I
suppose that if the index isn't used it will remain on disk and won't
affect the memory usage ... but what if the index is rarely used ...
wouldn't it be better to do a seqscan on a table that is frequently
accessed and mostly in memory than to consult an index that is stored
on the disk?

Sorry for all the stupid questions ;-)

   



All good questions! Before (or maybe as well as) looking at index sizes 
vs memory I'd check to see if any of your commonly run queries have 
suddenly started to use different plans due to data growth, e.g:


- index scan to seq scan (perhaps because effective_cache_size is too 
small now)

- hash agg to sort (work_mem too small now)

We had a case of the 1st point happen here a while ago, symptoms looked 
very like what you are describing.


Re index size, you could try indexes like:

some_table(a)
some_table(b)

which may occupy less space, and the optimizer can bitmap and/or them to 
work like the compound index some_table(a,b).


regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Memory usage - indexes

2010-09-29 Thread Mark Kirkwood

On 29/09/10 19:41, Tobias Brox wrote:

I just got this crazy, stupid or maybe genius idea :-)


Now, my idea is to drop that fat index and replace it with conditional
indexes for a dozen of heavy users - like those:

   acc_trans(trans_type, created) where customer_id=224885;
   acc_trans(trans_type, created) where customer_id=643112;
   acc_trans(trans_type, created) where customer_id=15;

or maybe like this:

   acc_trans(customer_id, trans_type, created) where customer_id in ( ... );

Any comments?

My sysadmin is worried that it would be a too big hit on performance
when doing inserts.  It may also cause more overhead when planning the
queries.  Is that significant?  Is this idea genius or stupid or just
somewhere in between?

   


Yeah, I think the idea of trying to have a few smaller indexes for the 
'hot' customers is a good idea. However I am wondering if just using 
single column indexes and seeing if the bitmap scan/merge of smaller 
indexes is actually more efficient is worth testing - i.e:


acc_trans(trans_type);
acc_trans(created);
acc_trans(customer_id);

It may mean that you have to to scrutinize your effective_cache_size and 
work_mem parameters, but could possibly be simpler and more flexible.


regards

Mark




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Memory usage - indexes

2010-09-30 Thread Mark Kirkwood

On 30/09/10 01:09, Tobias Brox wrote:

With the most popular trans type it chose another plan and it took
more than 3s (totally unacceptable):

   


Try tweeking effective_cache_size up a bit and see what happens - I've 
found these bitmap plans to be sensitive to it sometimes.


regards

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow count(*) again...

2010-10-13 Thread Mark Kirkwood

On 13/10/10 19:47, Neil Whelchel wrote:


Nope...
So, possible conclusions are:
1. Even with VACUUM database table speed degrades as tables are updated.
2. Time testing on a freshly INSERTed table gives results that are not real-
world.
3. Filesystem defragmentation helps (some).
4. Cache only makes a small difference once a table has been UPDATEd.

I am going to leave this configuration running for the next day or so. This
way I can try any suggestions and play with any more ideas that I have.
I will try these same tests on ext4 later, along with any good suggested
tests.
I will try MySQL with the dame data with both XFS and ext4.
-Neil-

   


I think that major effect you are seeing here is that the UPDATE has 
made the table twice as big on disk (even after VACUUM etc), and it has 
gone from fitting in ram to not fitting in ram - so cannot be 
effectively cached anymore.


This would not normally happen in real life (assuming UPDATEs only 
modify a small part of a table  per transaction). However administration 
updates (e.g 'oh! -  ref 1 should now be ref 2 please update 
everything') *will* cause the table size to double.


This is an artifact of Postgres's non overwriting storage manager - 
Mysql will update in place and you will not see this.


Try VACUUM FULL on the table and retest.

regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   4   5   >