[PERFORM] planner with index scan cost way off actual cost, advices to tweak cost constants?

2006-03-17 Thread Guillaume Cottenceau
Hi,

I have a problem with the postgres planner, which gives a cost to
index scan which is much higher than actual cost (worst case
considered, e.g. without any previous disk cache), and am posting
here for advices for tweaking cost constants. Because of this
problem, the planner typically chooses a seq scan when an index
scan would be more efficient, and I would like to correct this if
possible.

Reading the documentation and postgresql list archives, I have
run ANALYZE right before my tests, I have increased the
statistics target to 50 for the considered table; my problem is
that the index scan cost reported by EXPLAIN seems to be around
12.7 times higher that it should, a figure I suppose incompatible
(too large) for just random_page_cost and effective_cache_size
tweaks.


Structure of the table:

\d sent_messages
 Table public.sent_messages
  Column  |   Type   |   Modifiers  
  
--+--+
 uid  | integer  | not null default 
nextval('public.sent_messages_uid_seq'::text)
 sender   | character varying(25)| 
 receiver | character varying(25)| 
 action   | character varying(25)| 
 cost | integer  | 
 date | timestamp with time zone | not null default 
('now'::text)::timestamp(6) with time zone
 status   | character varying(128)   | 
 theme| character varying(25)| 
 operator | character varying(15)| 
Indexes:
sent_messages_pkey primary key, btree (uid)
idx_sent_msgs_date_theme_status btree (date, theme, status)


What I did:

- SET default_statistics_target = 50

- VACUUM FULL ANALYZE VERBOSE sent_messages - copied so that you
  can have a look at rows and pages taken up by relations

INFO:  vacuuming public.sent_messages
INFO:  sent_messages: found 0 removable, 3692284 nonremovable row versions in 
55207 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 103 to 177 bytes long.
There were 150468 unused item pointers.
Total free space (including removable row versions) is 2507320 bytes.
0 pages are or will become empty, including 0 at the end of the table.
2469 pages containing 262732 free bytes are potential move destinations.
CPU 0.57s/0.20u sec elapsed 11.27 sec.
INFO:  index sent_messages_pkey now contains 3692284 row versions in 57473 
pages
DETAIL:  0 index row versions were removed.
318 index pages have been deleted, 318 are currently reusable.
CPU 2.80s/1.27u sec elapsed 112.69 sec.
INFO:  index idx_sent_msgs_date_theme_status now contains 3692284 row 
versions in 88057 pages
DETAIL:  0 index row versions were removed.
979 index pages have been deleted, 979 are currently reusable.
CPU 4.22s/1.51u sec elapsed 246.88 sec.
INFO:  sent_messages: moved 0 row versions, truncated 55207 to 55207 pages
DETAIL:  CPU 1.87s/3.18u sec elapsed 42.71 sec.
INFO:  vacuuming pg_toast.pg_toast_77852470
INFO:  pg_toast_77852470: found 0 removable, 0 nonremovable row versions in 0 
pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 0 to 0 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 0 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index pg_toast_77852470_index now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO:  analyzing public.sent_messages
INFO:  sent_messages: 55207 pages, 15000 rows sampled, 3666236 estimated 
total rows

- select rows of the table with a range condition on date, find
  a range for which seq scan and index scan runtimes seem to be
  very close (I use Linux, I cat a 2G file to /dev/null between
  each request to flush disk cache, on a machine of 1G real RAM
  and 1G of swap, so that this is the worst case tested for index
  scan), notice that the cost used by the planner is 12.67 times
  higher for index scan, at a position it should be around 1 so
  that planner could make sensible choices:

EXPLAIN ANALYZE SELECT * FROM sent_messages WHERE date  '2005-09-01' AND date 
 '2005-09-19';
   QUERY PLAN   
 
-
 Seq Scan on sent_messages  (cost=0.00..110591.26 rows=392066 width=78) (actual 
time=7513.205..13095.147 rows=393074 loops=1)
   Filter: ((date  '2005-09-01 00:00:00+00'::timestamp with time zone) AND 
(date  '2005-09-19 00:00:00+00'::timestamp with time zone))
 Total runtime: 14272.522 ms


SET 

Re: [PERFORM] Background writer configuration

2006-03-17 Thread Evgeny Gridasov
Yesterday we recieved a new server 2xAMD64(2core x 2chips = 4 cores)
8GB RAM and RAID-1 (LSI megaraid)
I've maid some tests with pgbench (scaling 1000, database size ~ 16Gb)

First of all, I'd like to mention that it was strange to see that
the server performance degraded by 1-2% when we changed kernel/userland to 
x86_64
from default installed i386 userland/amd64 kernel. The operating system was 
Debian Linux,
filesystem ext3.

bg_writer_*_percent/maxpages setting did not dramatically increase performance,
but setting bg_writer_delay to values x10 original setting (2000-4000) increased
transaction rate by 4-7 times.
I've tried shared buffers 32768, 65536, performance was almost equal.

for all tests:
checkpoint_segments = 16 
checkpoint_timeout = 900
shared_buffers=65536
wal_buffers=128:


bgwriter_delay = 200
bgwriter_lru_percent = 10.0
bgwriter_lru_maxpages = 100
bgwriter_all_percent = 5.0
bgwriter_all_maxpages = 50

result:
./pgbench -c 32 -t 500 -U postgres regression
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1000
number of clients: 32
number of transactions per client: 500
number of transactions actually processed: 16000/16000
tps = 112.740903 (including connections establishing)
tps = 112.814327 (excluding connections establishing)

(disk activity about 2-4mb/sec writing)


bgwriter_delay = 4000
bgwriter_lru_percent = 10.0
bgwriter_lru_maxpages = 100
bgwriter_all_percent = 5.0
bgwriter_all_maxpages = 50

result:
./pgbench -c 32 -t 500 -U postgres regression
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1000
number of clients: 32
number of transactions per client: 500
number of transactions actually processed: 16000/16000
tps = 508.637831 (including connections establishing)
tps = 510.107981 (excluding connections establishing)

(disk activity about 20-40 mb/sec writing)

Setting bgwriter_delay to higher values leads to slower postgresql shutdown time
(I see postgresql writer process writing to disk). Sometimes postgresql didn't
shutdown correctly (doesn't complete background writing ?).

I've found some settings with which system behaves strange:

./pgbench -c 32 -t 3000 -U postgres regression

vmstat 1:

procs ---memory-- ---swap-- -io --system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy id wa
 1 25528  14992  22884 787673600   457   383   7783  1  0 94  5
 0  7632  14728  22892 78757800   88  4412  9456 1594 21623  9  5  8 78
 1 19796  16904  22928 78727120   16  3536  9053 1559 19717  9  4 12 75
 0  4872  14928  22936 78742080   36  3036  9092 1574 20874  9  4  2 85
 0 24912  16292  22964 78720680   44  3020  9316 1581 19922  9  4  9 78
 0  1912  17800  22980 786987600  2596  8700 1560 19926  9  4  4 84
 4 23996  18284  22996 7868292   320  3396 11048 1657 22802 11  5  3 81
 0 22960  14728  23020 7871448   520  3020  9648 1613 21641  9  4  5 82
 0 28   1008  15440  23028 78696240   48  2992 10052 1608 21430  9  5  5 82
 1 16   1088  17328  23044 786719600  2460  7884 1530 16536  8  3  9 79
 0 23   1088  18440  23052 786555600  3256 10128 1635 22587 10  4  4 81
 1 29   1076  14728  23076 786860400  2968  9860 1597 21518 10  5  7 79
 1 24   1136  15952  23084 78667000   40  2696  8900 1560 19311  9  4  5 81
 0 14   1208  17200  23112 78647360   16  2888  9508 1603 20634 10  4  6 80
 0 21   1220  18520  23120 78628280   72  2816  9487 1572 19888 10  4  7 79
 1 21   1220  14792  23144 786600000  2960  9536 1599 20331  9  5  5 81
 1 24   1220  16392  23152 786408800  2860  8932 1583 19288  9  4  3 84
 0 18   1276  18000  23168 786204800  2792  8592 1553 18843  9  4  9 78
 1 17   1348  19144  23176 78601320   16  2840  9604 1583 20654 10  4  6 80
 0 22 64  15112  23200 7864264  5280  3280  8785 1582 19339  9  4  7 80
 0 25 16  16008  23212 786266440  2764  8964 1605 18471  9  4  8 79
 0 26 16  17544  23236 786087200  3008  9848 1590 20527 10  4  7 79
 1  7 16  18704  23244 785896000  2756  8760 1564 19875  9  4  4 84
 1 25 16  15120  23268 786199600  2768  8512 1550 18518  9  3 12 75
 1 25 16  18076  23276 785981200  2484  8580 1536 18391  8  4  8 80
 0  3 16  17832  23300 786291600  2888  8864 1586 21450  9  4  4 83
 0 14 16  24280  23308 786603600  2816  9140 1537 20655  9  4  7 81
 1  1 16  54452  23348 786796800  1808  6988 1440 14235  6  9 24 61
 0  1 16  51988  23348 78680360060  4180 1344   885  1 10 72 16
 0  2 16  51988  23348 786803600 0  3560 143350  0  0 75 25
 0  2 16  51988  23348 786803600 0  2848 136446  0  0 75 25
 0  2 16  51988  23348 786803600 0  2560 135044  0  0 75 25
 0  4 16  51996  23360 786809200 0  2603 132860  0  

Re: [PERFORM] 1 TB of memory

2006-03-17 Thread Luke Lonergan
Josh,

On 3/16/06 9:43 PM, Josh Berkus josh@agliodbs.com wrote:

 With a single 3 Gbyte/second infiniband connection to the device?
 
 Hey, take it easy!  Jim's post was tongue-in-cheek.

You're right - I insulted his bandwidth, sorry :-)

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Background writer configuration

2006-03-17 Thread PFC



First of all, I'd like to mention that it was strange to see that
the server performance degraded by 1-2% when we changed kernel/userland  
to x86_64
from default installed i386 userland/amd64 kernel. The operating system  
was Debian Linux,

filesystem ext3.


	Did you use postgres compiled for AMD64 with the 64 kernel, or did you  
use a 32 bit postgres in emulation mode ?


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] 1 TB of memory

2006-03-17 Thread Rodrigo Madera
For God's sake buy a mainframe! =o)

On 3/17/06, Michael Stone [EMAIL PROTECTED] wrote:
 On Thu, Mar 16, 2006 at 10:44:25PM -0800, Luke Lonergan wrote:
 You'd be better off with 4 x $10K servers that do 800MB/s from disk each and
 a Bizgres MPP - then you'd do 3.2GB/s (faster than the SSD) at a price 1/10
 of the SSD, and you'd have 24TB of RAID5 disk under you.

 Except, of course, that your solution doesn't have a seek time of zero.
 That approach is great for applications that are limited by their
 sequential scan speed, not so good for applications with random access.
 At 3.2 GB/s it would still take over 5 minutes to seqscan a TB, so you'd
 probably want some indices--and you're not going to be getting 800MB/s
 per system doing random index scans from rotating disk (but you might
 with SSD). Try not to beat your product drum quite so loud...

 Mike Stone

 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] 1 TB of memory

2006-03-17 Thread Alex Stapleton
We got a quote for one of these (entirely for comedy value of course)  
and it was in the region of £1,500,000 give or take a few thousand.


On 16 Mar 2006, at 18:33, Jim Nasby wrote:

PostgreSQL tuned to the max and still too slow? Database too big to  
fit into memory? Here's the solution! http://www.superssd.com/ 
products/tera-ramsan/


Anyone purchasing one will be expected to post benchmarks! :)
--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of  
broadcast)---

TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] 1 TB of memory

2006-03-17 Thread Merlin Moncure
On 3/16/06, Jim Nasby [EMAIL PROTECTED] wrote:
 PostgreSQL tuned to the max and still too slow? Database too big to
 fit into memory? Here's the solution! http://www.superssd.com/
 products/tera-ramsan/

 Anyone purchasing one will be expected to post benchmarks! :)

I like their approach...ddr ram + raid sanity backup + super reliable
power system.  Their prices are on jupiter (and i dont mean jupiter,
fl) but hopefully there will be some competition and the invetible
decline in prices.  When prices drop from the current 1-2k$/Gb to a
more realistic 250$/Gb there will be no reason not to throw one into a
server.  You could already make a case for an entry level one to
handle the WAL and perhaps a few key tables/indexes, particularly ones
that are frequenct vacuum targets.

ddr approach is much faster than flash nvram inherintly and has a
virtually unlimited duty cycle.  My prediction is that by 2010 SSD
will be relatively commonplace in the server market, barring some
rediculous goverment intervention (patentes, etc).

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Background writer configuration

2006-03-17 Thread Evgeny Gridasov
template1=# select version();
   version  
 
-
 PostgreSQL 8.1.3 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 3.3.5 
(Debian 1:3.3.5-13)
(1 row)


On Fri, 17 Mar 2006 14:35:15 +0100
PFC [EMAIL PROTECTED] wrote:

 
  First of all, I'd like to mention that it was strange to see that
  the server performance degraded by 1-2% when we changed kernel/userland  
  to x86_64
  from default installed i386 userland/amd64 kernel. The operating system  
  was Debian Linux,
  filesystem ext3.
 
   Did you use postgres compiled for AMD64 with the 64 kernel, or did you  
 use a 32 bit postgres in emulation mode ?
 

-- 
Evgeny Gridasov
Software Engineer 
I-Free, Russia

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Background writer configuration

2006-03-17 Thread PFC

I got this :

template1=# select version();
 version
--
 PostgreSQL 8.1.2 on x86_64-pc-linux-gnu, compiled by GCC  
x86_64-pc-linux-gnu-gcc (GCC) 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0,  
pie-8.7.8)

(1 ligne)

Normally you should get a noticeable performance boost by using userland  
executables compiled for the 64 platform... strange...



On Fri, 17 Mar 2006 15:50:17 +0100, Evgeny Gridasov [EMAIL PROTECTED]  
wrote:



template1=# select version();
   version
-
 PostgreSQL 8.1.3 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 3.3.5  
(Debian 1:3.3.5-13)

(1 row)


On Fri, 17 Mar 2006 14:35:15 +0100
PFC [EMAIL PROTECTED] wrote:



 First of all, I'd like to mention that it was strange to see that
 the server performance degraded by 1-2% when we changed  
kernel/userland

 to x86_64
 from default installed i386 userland/amd64 kernel. The operating  
system

 was Debian Linux,
 filesystem ext3.

Did you use postgres compiled for AMD64 with the 64 kernel, or did you
use a 32 bit postgres in emulation mode ?







---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 1 TB of memory

2006-03-17 Thread Merlin Moncure
On 3/17/06, Rodrigo Madera [EMAIL PROTECTED] wrote:
 I don't know about you databasers that crunch in some selects, updates
 and deletes, but my personal developer workstation is planned to be a
 4x 300GB SATA300 with a dedicated RAID stripping controller (no
 checksums, just speedup) and 4x AMD64 CPUs... not to mention 2GB for
 each processor... all this in a nice server motherboard...

no doubt, that will handle quite a lot of data.  in fact, most
databases (contrary to popular opinion) are cpu bound, not i/o bound. 
However, at some point a different set of rules come into play.  This
point is constantly chaning due to the relentless march of hardware
but I'd suggest that at around 1TB you can no longer count on things
to run quickly just depending on o/s file caching to bail you out. 
Or, you may have a single table + indexes thats 50 gb that takes 6
hours to vacuum sucking all your i/o.

another useful aspect of SSD is the relative value of using system
memory is much less, so you can reduce swappiness and tune postgres to
rely more on the filesystem and give all your memory to work_mem and
such.

merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Background writer configuration

2006-03-17 Thread Kevin Grittner
 On Fri, Mar 17, 2006 at  6:24 am, in message
[EMAIL PROTECTED], Evgeny Gridasov
[EMAIL PROTECTED] wrote: 
 
 I've maid some tests with pgbench


If possible, tune the background writer with your actual application
code under normal load.  Optimal tuning is going to vary based on usage
patterns.  You can change these settings on the fly by editing the
postgresql.conf file and running pg_ctl reload.  This is very nice, as
it allowed us to try various settings in our production environment
while two machines dealt with normal update and web traffic and another
was in a saturated update process.

For us, the key seems to be to get the dirty blocks pushed out to the
OS level cache as soon as possible, so that the OS can deal with them
before the checkpoint comes along.

 for all tests:
 checkpoint_segments = 16 
 checkpoint_timeout = 900
 shared_buffers=65536
 wal_buffers=128:

 ./pgbench - c 32 - t 500 - U postgres regression

Unless you are going to be running in short bursts of activity, be sure
that the testing is sustained long enough to get through several
checkpoints and settle into a steady state with any caching
controller, etc.  On the face of it, it doesn't seem like this test
shows anything except how it would behave with a relatively short burst
of activity sandwiched between big blocks of idle time.  I think your
second test may look so good because it is just timing how fast it can
push a few rows into cache space.

 Setting bgwriter_delay to higher values leads to slower postgresql
shutdown time
 (I see postgresql writer process writing to disk). Sometimes
postgresql didn't
 shutdown correctly (doesn't complete background writing ?).

Yeah, here's where it gets to trying to finish all the work you avoided
measuring in your benchmark.

-Kevin


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Dan Harris

Markus Bertheau wrote:

Have you tried using a GIST index on lat  long? These things are
meant for two-dimensional data, whereas btree doesn't handle
two-dimensional data that well. How many rows satisfy either of the
long / lat condition?

  

According to the analyze, less than 500 rows matched.  I'll look into 
GIST indexes, thanks for the feedback.


-Dan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Dan Harris

Dan Harris wrote:

Markus Bertheau wrote:

Have you tried using a GIST index on lat  long? These things are
meant for two-dimensional data, whereas btree doesn't handle
two-dimensional data that well. How many rows satisfy either of the
long / lat condition?

 

According to the analyze, less than 500 rows matched.  I'll look into 
GIST indexes, thanks for the feedback.


-Dan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


When I try to create a GIST index, I get the following error:

create index eventgeo_lat_idx on eventgeo using GIST (lat);

ERROR:  data type double precision has no default operator class for 
access method gist
HINT:  You must specify an operator class for the index or define a 
default operator class for the data type.


I'm not sure what a default operator class is, exactly..

-Dan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Merlin Moncure
On 3/16/06, Dan Harris [EMAIL PROTECTED] wrote:
 explain analyze
 select distinct eventmain.incidentid, eventmain.entrydate,
 eventgeo.long, eventgeo.lat, eventgeo.geox, eventgeo.geoy
 from eventmain, eventgeo
 where
 eventmain.incidentid = eventgeo.incidentid and
 ( long  -104.998027962962 and long  -104.985957781349 ) and
 ( lat  39.7075542720006 and lat  39.7186195832938 ) and
 eventmain.entrydate  '2006-1-1 00:00' and
 eventmain.entrydate = '2006-3-17 00:00'
 order by
 eventmain.entrydate;

As others will probably mention, effective queries on lot/long which
is a spatial problem will require r-tree or gist.  I don't have a lot
of experience with exotic indexes but this may be the way to go.

One easy optimization to consider making is to make an index on either
(incidentid, entrydate) or (incident_id,long) which ever is more
selective.

This is 'yet another query' that would be fun to try out and tweak
using the 8.2 upcoming row-wise comparison.

merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Background writer configuration

2006-03-17 Thread Steve Atkins


On Mar 17, 2006, at 4:24 AM, Evgeny Gridasov wrote:


Yesterday we recieved a new server 2xAMD64(2core x 2chips = 4 cores)
8GB RAM and RAID-1 (LSI megaraid)
I've maid some tests with pgbench (scaling 1000, database size ~ 16Gb)

First of all, I'd like to mention that it was strange to see that
the server performance degraded by 1-2% when we changed kernel/ 
userland to x86_64
from default installed i386 userland/amd64 kernel. The operating  
system was Debian Linux,

filesystem ext3.


64 bit binaries usually run marginally slower than 32 bit binaries.
AIUI the main reason is that they're marginally bigger, so fit less
well in cache, have to haul themselves over the memory channels
and so on. They're couch potato binaries. I've seen over 10% performance
loss in compute-intensive code, so a couple of percent isn't too
bad at all.

If that 64 bit addressing gets you cheap access to lots of RAM, and
your main applications can make good use of that then
that can easily outweigh the overall loss in performance

Cheers,
  Steve


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Background writer configuration

2006-03-17 Thread Steinar H. Gunderson
On Fri, Mar 17, 2006 at 08:56:58AM -0800, Steve Atkins wrote:
 64 bit binaries usually run marginally slower than 32 bit binaries.

This depends a bit on the application, and what you mean by 64 bit (ie.
what architecture). Some specialized applications actually benefit from
having a 64-bit native data type (especially stuff working with a small
amount of bitfields -- think an anagram program), but Postgres is probably
not among them unless you do lots of arithmetic on bigints. amd64 has the
added benefit that you get twice as many registers available in 64-bit mode
(16 vs. 8 -- the benefit gets even bigger when you consider that a few of
those go to stack pointers etc.), so in some code you might get a few percent
extra from that, too.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 1 TB of memory

2006-03-17 Thread Jim C. Nasby
On Thu, Mar 16, 2006 at 10:44:25PM -0800, Luke Lonergan wrote:
 Jim,
 
  PostgreSQL tuned to the max and still too slow? Database too big to
  fit into memory? Here's the solution! http://www.superssd.com/
  products/tera-ramsan/
 
 With a single 3 Gbyte/second infiniband connection to the device?
 
 You'd be better off with 4 x $10K servers that do 800MB/s from disk each and
 a Bizgres MPP - then you'd do 3.2GB/s (faster than the SSD) at a price 1/10
 of the SSD, and you'd have 24TB of RAID5 disk under you.
 
 Plus - need more speed?  Add 12 more servers, and you'd run at 12.8GB/s and
 have 96TB of disk to work with, and you'd *still* spend less on HW and SW
 than the SSD.

Now what happens as soon as you start doing random I/O? :)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Bruno Wolff III
On Fri, Mar 17, 2006 at 08:34:26 -0700,
  Dan Harris [EMAIL PROTECTED] wrote:
 Markus Bertheau wrote:
 Have you tried using a GIST index on lat  long? These things are
 meant for two-dimensional data, whereas btree doesn't handle
 two-dimensional data that well. How many rows satisfy either of the
 long / lat condition?
 
   
 
 According to the analyze, less than 500 rows matched.  I'll look into 
 GIST indexes, thanks for the feedback.

Have you looked at using the Earth Distance contrib module? If a spherical
model of the earth is suitable for your application, then it may work for you
and might be easier than trying to create Gist indexes yourself.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Background writer configuration

2006-03-17 Thread 11

On 2006-03-17, at 15:50, Evgeny Gridasov wrote:


template1=# select version();
   version
-- 
---
 PostgreSQL 8.1.3 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC)  
3.3.5 (Debian 1:3.3.5-13)

(1 row)


How about something like:
$ file /usr/lib/postgresql/bin/postgres
(or whatever directory postmaster binary is in) instead?

--
11.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Merlin Moncure
On 3/17/06, Bruno Wolff III [EMAIL PROTECTED] wrote:
 Have you looked at using the Earth Distance contrib module? If a spherical
 model of the earth is suitable for your application, then it may work for you
 and might be easier than trying to create Gist indexes yourself.

earth distance = great stuff.  If the maximum error is known then you
can just pad the distance and filter the result on the client if exact
precision is needed.

Merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Background writer configuration

2006-03-17 Thread Evgeny Gridasov
[EMAIL PROTECTED]:~$ file /usr/lib/postgresql/8.1/bin/postgres 
/usr/lib/postgresql/8.1/bin/postgres: ELF 64-bit LSB executable, AMD x86-64, 
version 1 (SYSV), for GNU/Linux 2.6.0, dynamically linked (uses shared libs), 
stripped

On Fri, 17 Mar 2006 18:56:32 +0100
11 [EMAIL PROTECTED] wrote:

 On 2006-03-17, at 15:50, Evgeny Gridasov wrote:
 
  template1=# select version();
 version
  -- 
  ---
   PostgreSQL 8.1.3 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC)  
  3.3.5 (Debian 1:3.3.5-13)
  (1 row)
 
 How about something like:
 $ file /usr/lib/postgresql/bin/postgres
 (or whatever directory postmaster binary is in) instead?



-- 
Evgeny Gridasov
Software Engineer 
I-Free, Russia

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 1 TB of memory

2006-03-17 Thread Luke Lonergan
Jim,

On 3/17/06 9:36 AM, Jim C. Nasby [EMAIL PROTECTED] wrote:

 Now what happens as soon as you start doing random I/O? :)

Well - given that we've divided the data into 32 separate segments, and that
seeking is done in parallel over all 256 disk drives, random I/O rocks hard
and scales.  Of course, the parallelizing planner is designed to minimize
seeking as much as possible, as is the normal Postgres planner, but with
more segment and more parallel platters, seeking is faster.

The biggest problem with this idea of put huge amounts of data on your SSD
and everything is infinitely fast is that it ignores several critical
scaling factors:
- How much bandwidth is available in and out of the device?
- Does that bandwidth scale as you grow the data?
- As you grow the data, how long does it take to use the data?
- Can more than 1 CPU use the data at once?  Do they share the path to the
data?

If you are accessing 3 rows at a time from among billions, the problem you
have is mostly access time - so an SSD might be very good for some OLTP
applications.  However - the idea of putting Terabytes of data into an SSD
through a thin straw of a channel is silly.

Note that SSDs have been around for a *long* time.  I was using them on Cray
X/MP and 2 supercomputers back in 1987-92, when we had a 4 Million Word SSD
connected over a 2GB/s channel.  In fact, some people I worked with built a
machine with 4 Cray 2 computers that shared an SSD between them for parallel
computing and it was very effective, and also ungodly expensive and special
purpose.

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Dan Harris

Merlin Moncure wrote:


As others will probably mention, effective queries on lot/long which
is a spatial problem will require r-tree or gist.  I don't have a lot
of experience with exotic indexes but this may be the way to go.

One easy optimization to consider making is to make an index on either
(incidentid, entrydate) or (incident_id,long) which ever is more
selective.

This is 'yet another query' that would be fun to try out and tweak
using the 8.2 upcoming row-wise comparison.

merlin
  
Thanks to everyone for your suggestions.  One problem I ran into is that 
apparently my version doesn't support the GIST index that was 
mentioned.  function 'box' doesn't exist ).. So I'm guessing that both 
this as well as the Earth Distance contrib require me to add on some 
more pieces that aren't there.


Furthermore, by doing so, I am tying my queries directly to 
postgres-isms.  One of the long term goals of this project is to be 
able to fairly transparently support any ANSI SQL-compliant back end 
with the same code base.  If I had full control over the query designs, 
I could make stored procedures to abstract this.  However, I have to 
deal with a gray box third-party reporting library that isn't so 
flexible.  I'll certainly consider going with something 
postgre-specific, but only as a last resort.


I tried the multi-column index as mentioned above but didn't see any 
noticeable improvement in elapsed time, although the planner did use the 
new index.


What is the real reason for the index not being very effective on these 
columns?  Although the numbers are in a very limited range, it seems 
that the records would be very selective as it's not terribly common for 
multiple rows to share the same coords.


Is the 8.2. upcoming row-wise comparison something that would be 
likely to help me?


Thanks again for your input

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Merlin Moncure
On 3/17/06, Dan Harris [EMAIL PROTECTED] wrote:
 Merlin Moncure wrote:
 Thanks to everyone for your suggestions.  One problem I ran into is that
 apparently my version doesn't support the GIST index that was
 mentioned.  function 'box' doesn't exist ).. So I'm guessing that both
 this as well as the Earth Distance contrib require me to add on some
 more pieces that aren't there.

earth distance is a contrib module that has to be built and installed.
it does use some pg-isms so I guess that can be ruled out.  GIST is a
bit more complex and I would consider reading the documentation very
carefully regarding them and make your own determination.

 Furthermore, by doing so, I am tying my queries directly to
 postgres-isms.  [snip]

 I tried the multi-column index as mentioned above but didn't see any
 noticeable improvement in elapsed time, although the planner did use the
 new index.

did you try both flavors of the multiple key index I suggested? (there
were other possiblities, please experiment)

 Is the 8.2. upcoming row-wise comparison something that would be
 likely to help me?

possibly. good news is that rwc is ansi sql.  you can see my blog
about it here: http://people.planetpostgresql.org/merlin/

Specifically, if you can order your table with an order by statement
such that the records you want are contingous, then yes.  However,
even though it's ansi sql, various commercial databases implement rwc
improperly or not at all (mysql, to their credit, gets it right) and I
still feel like an exotic index or some other nifty pg trick might be
the best performance approach here).

Merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 1 TB of memory

2006-03-17 Thread Scott Marlowe
On Fri, 2006-03-17 at 15:28, Merlin Moncure wrote:
 On 3/17/06, Luke Lonergan [EMAIL PROTECTED] wrote:
   Now what happens as soon as you start doing random I/O? :)
  If you are accessing 3 rows at a time from among billions, the problem you
  have is mostly access time - so an SSD might be very good for some OLTP
  applications.  However - the idea of putting Terabytes of data into an SSD
  through a thin straw of a channel is silly.
 
 I'll 'byte' on this..right now the price for gigabyte of ddr ram is
 hovering around 60$/gigabyte.  If you conveniently leave aside the
 problem of making ddr ram fault tolerant vs making disks tolerant, you
 are getting 10 orders of magnitude faster seek time and unlimited
 bandwidth...at least from the physical device.  While SANs are getting
 cheaper they are still fairly expensive  at 1-5$/gigabyte depending on
 various factors.  You can do the same tricks on SSD storage as with
 disks.
 
 SSD storage is 1-2k$/gigabyte currently, but I think there is huge
 room to maneuver price-wise after the major players recoup their
 investments and market forces kick in.  IMO this process is already in
 play and the next cycle of hardware upgrades in the enterprise will be
 updating critical servers with SSD storage.  Im guessing by as early
 2010 a significant percentage of enterpise storage will be SSD of some
 flavor.

Now I'm envisioning building something with commodity 1U servers hold 4
to 16 gigs ram, and interconnected with 1g or 10g ethernet.

Open Source SSD via iSCSI with commodity hardware...  hmmm.  sounds like
a useful project.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec RAID 2200S

2006-03-17 Thread Kenji Morishige
About a year ago we decided to migrate our central database that powers various
intranet tools from MySQL to PostgreSQL. We have about 130 tables and about
10GB of data that stores various status information for a variety of services
for our intranet.  We generally have somewhere between 150-200 connections to
the database at any given time and probably anywhere between 5-10 new 
connections being made every second and about 100 queries per second. Most
of the queries and transactions are very small due to the fact that the tools
were designed to work around the small functionality of MySQL 3.23 DB.  
Our company primarily uses FreeBSD and we are stuck on FreeBSD 4.X series due
to IT support issues, but I believe I may be able to get more performance out
of our server by reconfiguring and setting up the postgresql.conf file up 
better.  The performance is not as good as I was hoping at the moment and 
it seems as if the database is not making use of the available ram.

snapshot of active server:
last pid:  5788;  load averages:  0.32,  0.31,  0.28
 up 127+15:16:08 13:59:24
169 processes: 1 running, 168 sleeping
CPU states:  5.4% user,  0.0% nice,  9.9% system,  0.0% interrupt, 84.7% idle
Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free
Swap: 4096M Total, 216K Used, 4096M Free

  PID USERNAME  PRI NICE  SIZERES STATE  C   TIME   WCPUCPU COMMAND
14501 pgsql   2   0   254M   242M select 2  76:26  1.95%  1.95% postgre
 5720 root   28   0  2164K  1360K CPU0   0   0:00  1.84%  0.88% top
 5785 pgsql   2   0   255M 29296K sbwait 0   0:00  3.00%  0.15% postgre
 5782 pgsql   2   0   255M 11900K sbwait 0   0:00  3.00%  0.15% postgre
 5772 pgsql   2   0   255M 11708K sbwait 2   0:00  1.54%  0.15% postgre


Here is my current configuration:

Dual Xeon 3.06Ghz 4GB RAM
Adaptec 2200S 48MB cache  4 disks configured in RAID5
FreeBSD 4.11 w/kernel options:
options SHMMAXPGS=65536
options SEMMNI=256
options SEMMNS=512
options SEMUME=256
options SEMMNU=256
options SMP # Symmetric MultiProcessor Kernel
options APIC_IO # Symmetric (APIC) I/O

The OS is installed on the local single disk and postgres data directory
is on the RAID5 partition.  Maybe Adaptec 2200S RAID5 performance is not as
good as the vendor claimed.  It was my impression that the raid controller 
these days are optimized for RAID5 and going RAID10 would not benefit me much.

Also, I may be overlooking a postgresql.conf setting.  I have attached the 
config file.

In summary, my questions:

1. Would running PG on FreeBSD 5.x or 6.x or Linux improve performance?

2. Should I change SCSI controller config to use RAID 10 instead of 5?

3. Why isn't postgres using all 4GB of ram for at least caching table for reads?

4. Are there any other settings in the conf file I could try to tweak?
# -
# PostgreSQL configuration file
# -
#
# This file consists of lines of the form:
#
#   name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Please note that re-commenting a setting is NOT sufficient to revert it
# to the default value, unless you restart the postmaster.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have 
# to SIGHUP the postmaster for the changes to take effect, or use 
# pg_ctl reload. Some settings, such as listen_address, require
# a postmaster shutdown and restart to take effect.


#---
# FILE LOCATIONS
#---

# The default values of these variables are driven from the -D command line
# switch or PGDATA environment variable, represented here as ConfigDir.
# data_directory = 'ConfigDir'  # use data in another directory
# hba_file = 'ConfigDir/pg_hba.conf'# the host-based authentication file
# ident_file = 'ConfigDir/pg_ident.conf'  # the IDENT configuration file

# If external_pid_file is not explicitly set, no extra pid file is written.
# external_pid_file = '(none)'  # write an extra pid file


#---
# CONNECTIONS AND AUTHENTICATION
#---

# - Connection Settings -


Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB

2006-03-17 Thread Scott Marlowe
On Fri, 2006-03-17 at 16:11, Kenji Morishige wrote:
 About a year ago we decided to migrate our central database that powers 
 various
 intranet tools from MySQL to PostgreSQL. We have about 130 tables and about
 10GB of data that stores various status information for a variety of services
 for our intranet.  We generally have somewhere between 150-200 connections to
 the database at any given time and probably anywhere between 5-10 new 
 connections being made every second and about 100 queries per second. Most
 of the queries and transactions are very small due to the fact that the tools
 were designed to work around the small functionality of MySQL 3.23 DB.  
 Our company primarily uses FreeBSD and we are stuck on FreeBSD 4.X series due
 to IT support issues,

There were a LOT of performance enhancements to FreeBSD with the 5.x
series release.  I'd recommend fast tracking the database server to the
5.x branch.  4-stable was release 6 years ago.  5-stable was released
two years ago.

 but I believe I may be able to get more performance out
 of our server by reconfiguring and setting up the postgresql.conf file up 
 better.

Can't hurt.  But if your OS isn't doing the job, postgresql.conf can
only do so much, nee?

   The performance is not as good as I was hoping at the moment and 
 it seems as if the database is not making use of the available ram.
 snapshot of active server:
 last pid:  5788;  load averages:  0.32,  0.31,  0.28  
up 127+15:16:08 13:59:24
 169 processes: 1 running, 168 sleeping
 CPU states:  5.4% user,  0.0% nice,  9.9% system,  0.0% interrupt, 84.7% idle
 Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free
 Swap: 4096M Total, 216K Used, 4096M Free
 
   PID USERNAME  PRI NICE  SIZERES STATE  C   TIME   WCPUCPU 
 COMMAND
 14501 pgsql   2   0   254M   242M select 2  76:26  1.95%  1.95% 
 postgre
  5720 root   28   0  2164K  1360K CPU0   0   0:00  1.84%  0.88% top
  5785 pgsql   2   0   255M 29296K sbwait 0   0:00  3.00%  0.15% 
 postgre
  5782 pgsql   2   0   255M 11900K sbwait 0   0:00  3.00%  0.15% 
 postgre
  5772 pgsql   2   0   255M 11708K sbwait 2   0:00  1.54%  0.15% 
 postgre

That doesn't look good.  Is this machine freshly rebooted, or has it
been running postgres for a while?  179M cache and 199M buffer with 2.6
gig inactive is horrible for a machine running a 10gig databases.

For comparison, here's what my production linux boxes show in top:
 16:42:27  up 272 days, 14:49,  1 user,  load average: 1.02, 1.04, 1.00
162 processes: 161 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpuusernice  systemirq  softirq  iowaitidle
   total0.2%0.0%0.4%   0.0% 0.0%0.4%   98.7%
   cpu000.4%0.0%0.4%   0.0% 0.0%0.0%   99.0%
   cpu010.0%0.0%0.4%   0.0% 0.0%0.9%   98.5%
Mem: 6096912k av, 4529208k used, 1567704k free, 0k shrd,  306884k buff
  2398948k actv, 1772072k in_d,   78060k in_c
Swap: 4192880k av,  157480k used, 4035400k free3939332k cached

  
PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
24000 postgres  15 0  752  524  456 S   0.0  0.0   0:00   1 rotatelogs
24012 postgres  15 0 1536 1420 1324 S   0.0  0.0   7:11   0 postmaster
24015 postgres  15 0 2196 2032  996 S   0.0  0.0  56:07   0 postmaster
24016 postgres  15 0 1496 1352 1004 S   0.0  0.0 233:46   1 postmaster

Note that the kernel here is caching ~3.9 gigs of data.  so, postgresql
doesn't have to.   Also, the disk buffers are sitting at  300 Megs.

If FreeBSD 4.x can't or won't cache more than that, there's an OS issue
here, either endemic to FreeBSD 4.x, or your configuration of it.


 Dual Xeon 3.06Ghz 4GB RAM

Make sure hyperthreading is disabled, it's generally a performance loss
for pgsql.

 Adaptec 2200S 48MB cache  4 disks configured in RAID5

I'm not a huge fan of adaptec RAID controllers, and 48 Megs ain't much. 
But for what you're doing, I'd expect it to run well enough.  Have you
tested this array with bonnie++ to see what kind of performance it gets
in general?  There could be some kind of hardware issue going on you're
not seeing in the logs.

Is that memory cache set to write back not through, and does it have
battery backup (the cache, not the machine)?

 The OS is installed on the local single disk and postgres data directory
 is on the RAID5 partition.  Maybe Adaptec 2200S RAID5 performance is not as
 good as the vendor claimed.  It was my impression that the raid controller 
 these days are optimized for RAID5 and going RAID10 would not benefit me much.

You have to be careful about RAID 10, since many controllers serialize
access through multiple levels of RAID, and therefore wind up being
slower in RAID 10 or 50 than in 

Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec RAID 2200S

2006-03-17 Thread Claus Guttesen
 Here is my current configuration:

 Dual Xeon 3.06Ghz 4GB RAM
 Adaptec 2200S 48MB cache  4 disks configured in RAID5
 FreeBSD 4.11 w/kernel options:
 options SHMMAXPGS=65536
 options SEMMNI=256
 options SEMMNS=512
 options SEMUME=256
 options SEMMNU=256
 options SMP # Symmetric MultiProcessor Kernel
 options APIC_IO # Symmetric (APIC) I/O

 The OS is installed on the local single disk and postgres data directory
 is on the RAID5 partition.  Maybe Adaptec 2200S RAID5 performance is not as
 good as the vendor claimed.  It was my impression that the raid controller
 these days are optimized for RAID5 and going RAID10 would not benefit me much.

I don't know whether 'systat -vmstat' is available on 4.x, if so try
to issue the command with 'systat -vmstat 1' for 1 sec. updates. This
will (amongst much other info) show how much disk-transfer you have.

 Also, I may be overlooking a postgresql.conf setting.  I have attached the
 config file.

You could try to lower shared_buffers from 3 to 16384. Setting
this value too high can in some cases be counterproductive according
to doc's I read.

Also try to lower work_mem from 16384 to 8192 or 4096. This setting is
for each sort, so it does become expensive in terms of memory when
many sorts are being carried out. It does depend on the complexity of
your sorts of course.

Try to do a vacuum analyse in your crontab. If your aliases-file is
set up correctly mails generated by crontab will be forwarded to a
human being. I have the following in my (root) crontab (and mail to
root forwarded to me):

time /usr/local/bin/psql -d dbname -h dbhost -U username -c vacuum
analyse verbose;

 In summary, my questions:

 1. Would running PG on FreeBSD 5.x or 6.x or Linux improve performance?

Going to 6.x would probably increase overall performance, but you have
to try it out first. Many people report increased performance just by
upgrading, some report that it grinds to a halt. But SMP-wise 6.x is a
more mature release than 4.x is. Changes to the kernel from being
giant-locked in 4.x to be fine-grained locked started in 5.x and
have improved in 6.x. The disk- and network-layer should behave
better.

Linux, don't know. If your expertise is in FreeBSD try this first and
then move to Linux (or Solaris 10) if 6.x does not meet your
expectations.

 3. Why isn't postgres using all 4GB of ram for at least caching table for 
 reads?

I guess it's related to the usage of the i386-architecture in general.
If the zzeons are the newer noconas you can try the amd64-port
instead. This can utilize more memory (without going through PAE).

 4. Are there any other settings in the conf file I could try to tweak?

max_fsm_pages and max_fsm_relations. You can look at the bottom of
vacuum analyze and increase the values:

INFO:  free space map: 153 relations, 43445 pages stored; 45328 total
pages needed

Raise max_fsm_pages so it meet or exceed 'total pages needed' and
max_fsm_relations to relations.

This is finetuning though. It's more important to set work- and
maintenance-mem correct.

hth
Claus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec RAID 2200S

2006-03-17 Thread Tom Lane
Kenji Morishige [EMAIL PROTECTED] writes:
 ...  We generally have somewhere between 150-200 connections to
 the database at any given time and probably anywhere between 5-10 new 
 connections being made every second and about 100 queries per second. Most
 of the queries and transactions are very small due to the fact that the tools
 were designed to work around the small functionality of MySQL 3.23 DB.

You should think seriously about putting in some sort of
connection-pooling facility.  Postgres backends aren't especially
lightweight things; the overhead involved in forking a process and then
getting its internal caches populated etc. is significant.  You don't
want to be doing that for one small query, at least not if you're doing
so many times a second.

 it seems as if the database is not making use of the available ram.

Postgres generally relies on the kernel to do the bulk of the disk
caching.  Your shared_buffers setting of 3 seems quite reasonable to
me; I don't think you want to bump it up (not much anyway).  I'm not too
familiar with FreeBSD and so I'm not clear on what Inact is:

 Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free
 Swap: 4096M Total, 216K Used, 4096M Free

If Inact covers disk pages cached by the kernel then this is looking
reasonably good.  If it's something else then you got a problem, but
fixing it is a kernel issue not a database issue.

 #max_fsm_pages = 2# min max_fsm_relations*16, 6 bytes each

You almost certainly need to bump this way up.  2 is enough to cover
dirty pages in about 200MB of database, which is only a fiftieth of
what you say your disk footprint is.  Unless most of your data is
static, you're going to be suffering severe table bloat over time due
to inability to recycle free space properly.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB

2006-03-17 Thread Scott Marlowe
On Fri, 2006-03-17 at 17:03, Claus Guttesen wrote:
  Here is my current configuration:

  Also, I may be overlooking a postgresql.conf setting.  I have attached the
  config file.
 
 You could try to lower shared_buffers from 3 to 16384. Setting
 this value too high can in some cases be counterproductive according
 to doc's I read.

FYI, that was very true before 8.0, but since the introduction of better
cache management algorithms, you can have pretty big shared_buffers
settings.

 Also try to lower work_mem from 16384 to 8192 or 4096. This setting is
 for each sort, so it does become expensive in terms of memory when
 many sorts are being carried out. It does depend on the complexity of
 your sorts of course.

But looking at his usage of RAM on his box, it doesn't look like one at
the time that snapshot was taken.  Assuming the box was busy then, he's
OK.  Otherwise, he'd show a usage of swapping, which he doesn't.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec RAID 2200S

2006-03-17 Thread Claus Guttesen
 4. Are there any other settings in the conf file I could try to tweak?

One more thing :-)

I stumbled over this setting, this made the db (PG 7.4.9) make use of
the index rather than doing a sequential scan and it reduced a query
from several minutes to some 20 seconds.

random_page_cost = 2 (original value was 4).

Another thing you ought to do is to to get the four-five most used
queries and do an explain analyze in these. Since our website wasn't
prepared for this type of statistics I simply did a tcpdump, grep'ed
all select's, sorted them and sorted them unique so I could see which
queries were used most.

regards
Claus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB

2006-03-17 Thread Mark Kirkwood

Scott Marlowe wrote:

On Fri, 2006-03-17 at 16:11, Kenji Morishige wrote:


About a year ago we decided to migrate our central database that powers various
intranet tools from MySQL to PostgreSQL. We have about 130 tables and about
10GB of data that stores various status information for a variety of services
for our intranet.  We generally have somewhere between 150-200 connections to
the database at any given time and probably anywhere between 5-10 new 
connections being made every second and about 100 queries per second. Most

of the queries and transactions are very small due to the fact that the tools
were designed to work around the small functionality of MySQL 3.23 DB.  
Our company primarily uses FreeBSD and we are stuck on FreeBSD 4.X series due

to IT support issues,



There were a LOT of performance enhancements to FreeBSD with the 5.x
series release.  I'd recommend fast tracking the database server to the
5.x branch.  4-stable was release 6 years ago.  5-stable was released
two years ago.




I would recommend skipping 5.x and using 6.0 - as it performs measurably 
better than 5.x. In particular the vfs layer is no longer under the 
GIANT lock, so you will get considerably improved concurrent filesystem 
access on your dual Xeon.


Regards

Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec RAID 2200S

2006-03-17 Thread Kenji Morishige
Thanks guys, I'm studying each of your responses and am going to start to 
experiement. Unfortunately, I don't have another box with similar specs to
do a perfect experiment with, but I think I'm going to go ahead and open a 
service window to ungrade the box to FBSD6.0 and apply some other changes. It
also gives me the chance to go from 8.0.1 to 8.1 series which I been wanting
to do as well.  Thanks guys and I will see if any of your suggestions make 
a noticable difference.  I also have been looking at log result of slow queries
and making necessary indexes to make those go faster.

-Kenji

On Sat, Mar 18, 2006 at 12:29:17AM +0100, Claus Guttesen wrote:
  4. Are there any other settings in the conf file I could try to tweak?
 
 One more thing :-)
 
 I stumbled over this setting, this made the db (PG 7.4.9) make use of
 the index rather than doing a sequential scan and it reduced a query
 from several minutes to some 20 seconds.
 
 random_page_cost = 2 (original value was 4).
 
 Another thing you ought to do is to to get the four-five most used
 queries and do an explain analyze in these. Since our website wasn't
 prepared for this type of statistics I simply did a tcpdump, grep'ed
 all select's, sorted them and sorted them unique so I could see which
 queries were used most.
 
 regards
 Claus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Tom Lane
Dan Harris [EMAIL PROTECTED] writes:
 Furthermore, by doing so, I am tying my queries directly to 
 postgres-isms.  One of the long term goals of this project is to be 
 able to fairly transparently support any ANSI SQL-compliant back end 
 with the same code base.

Unfortunately, there isn't any portable or standard (not exactly the
same thing ;-)) SQL functionality for dealing gracefully with
two-dimensional searches, which is what your lat/long queries are.
You should accept right now that you can have portability or you can
have good performance, not both.

Merlin's enthusiasm for row-comparison queries is understandable because
that fix definitely helped a common problem.  But row comparison has
nothing to do with searches in two independent dimensions.  Row
comparison basically makes it easier to exploit the natural behavior of
multicolumn btree indexes ... but a multicolumn btree index does not
efficiently support queries that involve separate range limitations on
each index column.  (If you think about the index storage order you'll
see why: the answer entries are not contiguous in the index.)

To support two-dimensional searches you really need a non-btree index
structure, such as GIST.  Since this isn't standard, demanding a
portable answer won't get you anywhere.  (I don't mean to suggest that
Postgres is the only database that has such functionality, just that
the DBs that do have it don't agree on any common API.)

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Michael Fuhr
On Fri, Mar 17, 2006 at 11:41:11PM -0500, Tom Lane wrote:
 Dan Harris [EMAIL PROTECTED] writes:
  Furthermore, by doing so, I am tying my queries directly to 
  postgres-isms.  One of the long term goals of this project is to be 
  able to fairly transparently support any ANSI SQL-compliant back end 
  with the same code base.
 
 Unfortunately, there isn't any portable or standard (not exactly the
 same thing ;-)) SQL functionality for dealing gracefully with
 two-dimensional searches, which is what your lat/long queries are.

The OpenGIS Simple Features Specification[1] is a step in that
direction, no?  PostGIS[2], MySQL[3], and Oracle Spatial[4] implement
to varying degrees.  With PostGIS you do have to add non-standard
operators to a query's predicate to benefit from GiST indexes on
spatial columns, but the rest of the query can be straight out of
the SQL and OGC standards.

[1] http://www.opengeospatial.org/docs/99-049.pdf
[2] http://www.postgis.org/
[3] http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html
[4] http://www.oracle.com/technology/products/spatial/index.html

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly