[PERFORM] Extreme high load averages

2003-07-05 Thread Martin Foster
The only time that I have ever seen load averages of 30 or more under 
OpenBSD is when one of my scripts goes wild.However, I can say that 
I am also seeing these load averages under PostgreSQL 7.3.2 after a 
migration to it from MySQL.

MySQL Statistics:
Uptime: 1055352  Threads: 178  Questions: 75161710  Slow queries: 46 
Opens: 1084  Flush tables: 1  Open tables: 206 Queries per second avg: 
71.220

The above are statistics from older generation scripts that would make 
use of MySQL as to give an idea of what's going on.   That generation of 
scripts would handle the referential integrity, since foreign key 
constraints are not enforced under that system.   However, the system 
handled 250 concurrent users without a singular problem, while under 
Postgres with new scripts using functions, referential integrity, 
transactions and lighter code, the system starts to buckle at even less 
then 70 users.

What I would like to know is.   Why?   The kernel has been compiled to 
handle the number of concurrent connections, the server may not be the 
best, but it should be able to handle the requests: PIII 1Ghz, 1GB 
SDRAM, 2 IDE 20GB drives.

I have changed settings to take advantage of the memory.  So the 
following settings are of interest:
	shared_buffers = 16384
	wal_buffers = 256
	sort_mem = 16384
	vacuum_mem = 32768

Statistics gathering has now been disabled, and logging is done through 
syslog.I do not expect those settings to cripple system performance 
however.

The scripts are heavy SELECTS with a fair dose of UPDATES and INSERTS. 
 To get a concept of what these scripts done, you can look at Ethereal 
Realms (http://www.ethereal-realms.org) which are running the PostgreSQL 
script variants or consider that this is a chat site.

Anyone have ideas?   Is the use of connection pooling consider bad? 
Should flush be run more then once a day? I have no intention of going 
back to MySQL, and would like to make this new solution work.

    Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Extreme high load averages

2003-07-06 Thread Martin Foster
Shridhar Daithankar wrote:

On 5 Jul 2003 at 22:54, Martin Foster wrote:

What I would like to know is.   Why?   The kernel has been compiled to 
handle the number of concurrent connections, the server may not be the 
best, but it should be able to handle the requests: PIII 1Ghz, 1GB 
SDRAM, 2 IDE 20GB drives.

I have changed settings to take advantage of the memory.  So the 
following settings are of interest:
	shared_buffers = 16384
	wal_buffers = 256
	sort_mem = 16384
	vacuum_mem = 32768


As somebody else has already pointed out, your sort_mem is bit too high
than required. Try lowering it.
Secondly did you tune effective_cache_size?

HTH
Bye
 Shridhar
--
Power, n.:  The only narcotic regulated by the SEC instead of the FDA.
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly
I dropped the size of the sort_mem down to 8 megs.   Since I am not 
swapping to cache at all this should not post much of a problem at that 
value.

effective_cache_size seems interesting, though the description is 
somewhat lacking.  Is this related to the swap partition and how much of 
it will be used by PostgreSQL?   If I am correct, this should be fairly low?
	Martin Foster
	Creator/Designer Ethereal Realms
	[EMAIL PROTECTED]



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Extreme high load averages

2003-07-06 Thread Martin Foster
Richard Huxton wrote:

On Sunday 06 Jul 2003 5:54 am, Martin Foster wrote:

The only time that I have ever seen load averages of 30 or more under
OpenBSD is when one of my scripts goes wild.However, I can say that
I am also seeing these load averages under PostgreSQL 7.3.2 after a
migration to it from MySQL.
[snip]

However, the system
handled 250 concurrent users without a singular problem, while under
Postgres with new scripts using functions, referential integrity,
transactions and lighter code, the system starts to buckle at even less
then 70 users.
[snip]

PIII 1Ghz, 1GB
SDRAM, 2 IDE 20GB drives.
I have changed settings to take advantage of the memory.  So the
following settings are of interest:
shared_buffers = 16384
wal_buffers = 256
sort_mem = 16384
vacuum_mem = 32768


You do know that sort_mem is in kB per sort (not per connection, but per sort 
being done by a connection). That's 16MB per sort you've allowed in main 
memory, or for 70 concurrent sorts up to 1.1GB of memory allocated to 
sorting. You're not going into swap by any chance?

Might want to try halving shared_buffers too and see what happens.

I don't know the *BSDs myself, but do you have the equivalent of iostat/vmstat 
output you could get for us? Also a snapshot of "top" output? People are 
going to want to see:
 - overall memory usage (free/buffers/cache/swap)
 - memory usage per process
 - disk activity (blocks in/out)

From that lot, someone will be able to point towards the issue, I'm sure.
Actually, no I did not.   Which is probably why it was as high as it is. 
  When looking at the PostgreSQL Hardware Performance Tuning page, it 
seems to imply that you should calculate based on RAM to give it an 
appropriate value.

 http://www.postgresql.org/docs/aw_pgsql_book/hw_performance/node8.html

I dropped that value, and will see if that helps.   The thing is, the 
system always indicated plenty of memory available.   Even when at a 30 
load level the free memory was still roughly 170MB.

Tomorrow will be a good gage to see if the changes will actually help 
matters.And if they do not, I will include vmstat, iostat, and top 
as requested.

Thanks!

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Extreme high load averages

2003-07-06 Thread Martin Foster
Shridhar Daithankar wrote:
It gives hint to psotgresql how much file system cache is available in the 
system. 

You have 1GB memory and your application requirement does not exceed 400MB. So 
OS can use roughly 600MB for file system cache. In that case you can set this 
parameter to 400MB cache to leave room for other application in FS cache.

IIRC, BSD needs sysctl tuning to make more memory available for FS cache other 
wise they max out at 300MB.

Roughly this setting should be (total memory -application 
requirement)*(0.7/0.8)

I guess that high kernel load you are seeing due to increased interaction 
between postgresql and OS when data is swapped to/fro in shared memory. If OS 
cache does well, postgresql should reduce this interaction as well.

BTW, since you have IDE disks, heavy disk activity can eat CPU as well. Is 
your disk bandwidth totally maxed out? Check with vmstat or whatever 
equivalent you have on BSD.

 Shridhar

---(end of broadcast)---
TIP 6: Have you searched our list archives?
   http://archives.postgresql.org
I changed the value of effective_cache_size seems interesting to 512. 
The database restarted without any problems and load averages seem to be 
a bit lower as a result.

Since people have been asking for it, I added in most of the stat 
command outputs that I could think of.   All located below my signature 
block, this will show you what roughly 127 client connections with 
Postgre will generate.   The numbers are a lot nicer to see then a 30 
load level.

Note, that the high number of connections is a side effect of connection 
pooling under Apache using Apache::DBI.   This means that for every 
client on the http server there is a connection to Postgres even if the 
connection is idle.

The above may be a factor of performance as well.   As I had noticed 
that with an idle child setting being too high, that server would show 
very high load averages as well.   Probably an indication that the 
system is continually forking new children trying to just keep the idle 
child count at the right level.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
vmstat:
 2:09PM  up 16:45, 1 user, load averages: 0.36, 0.30, 0.35
vmstat:
 procs   memory
 r b wavmfre
 1 0 0 234036 687548
 page
 flt  re  pi  po  fr  sr
 621   0   0   0   0   0
 faults cpu
  insy   cs us sy id
 364   396   88 19  1 79
iostat:
  ttywd0 wd1 cpu
 tin tout  KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
   0 1023  4.53   1 0.01   9.72  11 0.10  19  0  1  0 79
pstat -s:
 Device  512-blocks UsedAvail Capacity  Priority
 swap_device41942880  4194288 0%0
top header:
  load averages:  0.31,  0.35,  0.42 

  147 processes: 2 running, 145 idle
  CPU states: 32.9% user, 0.0% nice, 0.9% system, 0.0% interrupt, 66.2% 
idle
  Memory: Real: 263M/377M act/tot  Free: 630M  Swap: 0K/2048M used/tot

ps -uax:
USER   PID %CPU %MEM   VSZ   RSS TT   STAT STARTED   TIME COMMAND
postgres  1561  0.0  0.5  2120  4812 p0  I  1:48PM0:00.10 
/usr/local/bin/postmaster (postgres)
postgres  9935  0.0  2.8  3832 29744 p0  I  1:48PM0:00.74 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres  7436  0.0  0.6  3640  6636 p0  S  1:48PM0:00.92 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 18814  0.0  7.0  3876 72904 p0  I  1:48PM0:04.53 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 15346  0.0  4.1  3820 42468 p0  I  1:48PM0:00.93 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 13621  0.0  6.9  3832 71824 p0  I  1:48PM0:02.66 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 20916  0.0  4.7  3812 49164 p0  I  1:48PM0:00.59 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 21684  0.0  2.2  3688 23356 p0  S  1:48PM0:01.27 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 19472  0.0  6.9  3824 72452 p0  I  1:48PM0:02.61 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 27361  0.0  0.7  3664  6976 p0  S  1:48PM0:00.91 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 28925  0.0  2.8  3840 29528 p0  I  1:48PM0:00.46 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 12790  0.0  2.7  3800 28080 p0  I  1:48PM0:01.11 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 13957  0.0  6.8  3820 71476 p0  I  1:48PM0:02.26 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 29129  0.0  2.8  3828 29096 p0  I  1:48PM0:01.50 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 24279  0.0  2.7  3824 27992 p0  S  1:48PM0:01.08 
postmaster: ethereal ethereal 192.168.1.6 idle in tra
postgres 20382  0.0  0.6  3640  6748 p0  S  1:48PM0:00.91 
postmaster: ethereal ethereal

Re: [PERFORM] Extreme high load averages

2003-07-06 Thread Martin Foster
Tom Lane wrote:

Martin Foster <[EMAIL PROTECTED]> writes:

The only time that I have ever seen load averages of 30 or more under
OpenBSD is when one of my scripts goes wild.


Note also that "high load average" is not per se an indication that
anything is wrong.  In Postgres, if you have thirty queries waiting
for disk I/O, that's thirty processes --- so if that's the average
state then the kernel will report a load average of thirty.  While
I'm no MySQL expert, I believe that the equivalent condition in MySQL
would be thirty threads blocked for I/O within one process.  Depending
on how your kernel is written, that might show as a load average of
one ... but the difference is completely illusory, because what counts
is the number of disk I/Os in flight, and that's the same.
You didn't say whether you were seeing any real performance problems,
like slow queries or performance dropping when query load rises, but
that is the aspect to concentrate on.
I concur with the nearby recommendations to drop your resource settings.
The thing you have to keep in mind about Postgres is that it likes to
have a lot of physical RAM available for kernel disk buffers (disk
cache).  In a correctly tuned system that's been up for any length of
time, "free memory" should be nearly nada, and the amount of RAM used
for disk buffers should be sizable (50% or more of RAM would be good
IMHO).
			regards, tom lane
Under a circumstance where we have 250 concurrent users, MySQL would 
report an uptime of 0.5 sometimes 0.8 depending on the tasks being 
performed.

This would translate to wait times averaging less then a second, and 
under a heavy resource script 4 seconds.That system had less RAM 
however.

This new system when showing a load average of 30, produced wait times 
of 12 seconds averages and about 30 seconds for the heavy resource 
script. The web server itself showed a load average of 0.5 showing 
that it was not heavy client interaction slowing things down.

So there is a very noticeable loss of performance when the system 
skyrockets like that.   All of the load as indicated by top is at user 
level, and not swap is even touched.

This may help show why I was slightly concerned.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-06 Thread Martin Foster
Michael Pohl wrote:
On Sun, 6 Jul 2003, Matthew Nuzum wrote:


At the very least, if there is good documentation for these parameters,
maybe the conf file should provide a link to this info. 


I believe that is what Josh is proposing:

http://archives.postgresql.org/pgsql-performance/2003-07/msg00102.php


[Apache httpd] uses a three phase (if not more) documentation level.  
The .conf file contains detailed instructions in an easy to read and
not-to-jargon-ish structure.  The docs provide detailed tutorials and
papers that expand on configuration params in an easy to read format.  
Both of these refer to the thorough reference manual that breaks each
possible option down into it's nitty gritty details so that a user can
get more information if they so desire.


I agree that Apache's approach is primo.  Often the .conf comments are
enough to jog my memory about a directive I haven't used for a while.  Or
the comments are enough to let me know I don't need a directive, or that I
need to go to the manual and read more.  I appreciate that.
michael



One thing that may also help, is to include more sample .conf files. 
For example, you could include settings that would be commonly seen for 
decicated databases with generic specs and another with less resources 
and not dedicated for use with Postgres.

This would allow users to see how certain setting changes will work. 
The default .conf is great if you want to setup a small test bed, but 
for a real life example chances are it won't exactly be what your 
looking for.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Extreme high load averages

2003-07-06 Thread Martin Foster
Richard Huxton wrote:
I don't know the *BSDs myself, but do you have the equivalent of iostat/vmstat 
output you could get for us? Also a snapshot of "top" output? People are 
going to want to see:
 - overall memory usage (free/buffers/cache/swap)
 - memory usage per process
 - disk activity (blocks in/out)

I changed a bit of the scripting code to cut down on the weight of a 
query being run.   This is the only thing in the entire system that 
would cause scripts to run at high processor times for extended lengths. 
 With the corrections, postgres processes average more closely to < 1% 
then before.

This is not stopping the system from getting high load averages. 
Attached, is an example of the site running at 160 users with very slow 
response rates (30 seconds for some scripts).   According to top, and ps 
nothing is taking up all that processing time.

The processor seems to be purposely sitting there twiddling it's thumbs. 
 Which leads me to believe that perhaps the nice levels have to be 
changed on the server itself?And perhaps increase the file system 
buffer to cache files in memory instead of always fetching/writing them?

Anyone more ideas?

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
--- top ---
load averages:  5.00,  4.72,  3.75 
21:45:56
134 processes: 6 running, 128 idle
CPU states: 91.7% user,  0.0% nice,  6.6% system,  1.6% interrupt,  0.2% idle
Memory: Real: 279M/390M act/tot  Free: 617M  Swap: 0K/2048M used/tot

  PID USERNAME PRI NICE  SIZE   RES STATE WAIT TIMECPU COMMAND
23235 postgres  640   12M   95M run   -0:02  9.42% postgres
 5299 postgres  640 3872K   57M run   -0:01  5.86% postgres
 8933 postgres  640 3408K   55M run   -0:01  5.47% postgres
16398 postgres   20 3776K   17M sleep netio0:02  0.05% postgres
14007 named  20 2528K 2572K sleep select   0:06  0.00% named
 3684 postgres   20 2120K 4812K sleep select   0:07  0.00% postgres
23518 postgres   20 3664K   36M sleep netio0:03  0.00% postgres
  571 postgres   20 3776K   51M sleep netio0:03  0.00% postgres
11159 postgres   20 3664K   35M sleep netio0:03  0.00% postgres
19184 postgres   20 3776K   16M sleep netio0:03  0.00% postgres
28931 postgres   20 3712K   16M sleep netio0:02  0.00% postgres
17523 postgres   20 3712K   14M sleep netio0:02  0.00% postgres
 8272 postgres   20 3712K   14M sleep netio0:02  0.00% postgres
12034 postgres   20 3712K   14M sleep netio0:02  0.00% postgres
30825 postgres   20 3776K   17M sleep netio0:02  0.00% postgres
29173 postgres   20 3712K   15M sleep netio0:02  0.00% postgres
 9472 postgres   20 3664K   34M sleep netio0:02  0.00% postgres
11542 postgres   20 3776K   16M sleep netio0:02  0.00% postgres


--- vmstat --- 
 procs   memorypagedisks faults cpu
 r b wavmfre   flt  re  pi  po  fr  sr wd0 wd1   insy   cs us sy id
 1 0 0 275352 642800  8173   0   0   0   0   0  18  21  534   853  243 50  5 45

--- iostat --- 
  ttywd0 wd1 cpu
 tin tout  KB/t t/s MB/s   KB/t t/s MB/s  us ni sy in id
   0   74 13.00  18 0.23  10.08  21 0.21  50  0  4  1 45

--- pstat -s --- 
 Device  512-blocks UsedAvail Capacity  Priority
 swap_device41942880  4194288 0%0

--- dmesg --- 
 OpenBSD 3.3-stable (compile) #2: Sat Jul  5 15:17:30 MDT 2003
[EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile
 cpu0: Intel Pentium III (Coppermine) ("GenuineIntel" 686-class) 1 GHz
 cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,SYS,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SIMD
 real mem  = 1073250304 (1048096K)
 avail mem = 992940032 (969668K)
 using 4278 buffers containing 53764096 bytes (52504K) of memory
 mainbus0 (root)
 bios0 at mainbus0: AT/286+(a1) BIOS, date 07/20/01, BIOS32 rev. 0 @ 0xf0b20
 apm0 at bios0: Power Management spec V1.2
 apm0: AC on, battery charge unknown
 pcibios0 at bios0: rev. 2.1 @ 0xf/0x1382
 pcibios0: PCI IRQ Routing Table rev. 1.0 @ 0xf12d0/176 (9 entries)
 pcibios0: PCI Interrupt Router at 000:04:0 ("VIA VT82C586 PCI-ISA" rev 0x00)
 pcibios0: PCI bus #1 is the last bus
 bios0: ROM list: 0xc/0x8000 0xc8000/0x1800
 pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
 pchb0 at pci0 dev 0 function 0 "VIA VT82C691 Host-PCI" rev 0xc4
 ppb0 at pci0 dev 1 function 0 "VIA VT82C598 PCI-AGP" rev 0x00
 pci1 at ppb0 bus 1
 vga1 at pci1 dev 0 function 0 "NVidia/SGS-Thomson Velocity128" rev 0x22
 wsdisplay0 at vga1: console (80x25, vt100 emulation)
 wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
 pcib0 at pci0 dev 4 function 0 "VIA VT82C686 PCI-ISA" rev 0x40
 pciide0 at pci0 dev 4 function 1 "VIA VT82C571 IDE" rev 0x0

Re: [PERFORM] Extreme high load averages

2003-07-07 Thread Martin Foster
scott.marlowe wrote:


I would try a few things.  First off, effective_cache_size is the size 
measured in 8k blocks, so 512 would be a setting of 4 Megs.  Probably a 
little low.  If you average 512Meg free, that would be a setting of 65536.

Note that the higer the effective_cache_size, the more the planner will 
favor index scans, and the lower, the more it will favor sequential scans.

Generally speaking, index scans cost in CPU terms, while seq scans cost in 
I/O time.

Since you're reporting low CPU usage, I'm guessing you're getting a lot of 
seq scans.

Do you have any type mismatches anywhere that could be the culprit?  
running vacuum and analyze regurlarly?  Any tables that are good 
candidates for clustering?

A common problem is a table like this:

create table test (info text, id int8 primary key);
insert into test values ('ted',1);
.. a few thousand more inserts;
vacuum full;
analyze;
select * from test where id=1;
will result in a seq scan, always, because the 1 by itself is 
autoconverted to int4, which doesn't match int8 automatically.  This 
query:

select * from test where id=1::int8

will cast the 1 to an int8 so the index can be used.

That last trick actually listed seemed to have solved on the larger 
slowdowns I had.   It would seem that a view was making use of INTERVAL 
and CURRENT_TIMESTAMP.   However, the datatype did not make use of 
timezones and that caused significant slowdowns.

By using ::TIMESTAMP, it essentially dropped the access time from 4.98+ 
to 0.98 seconds.   This alone makes my day, as it shows that Postgres is 
performing well, but is just a bit more picky about the queries.

I changed the settings as you recommended, locked the memory to 768 megs 
so that PostgreSQL cannot go beyond that and made the database priority 
higher.   All of those changes seems to have increase overall performance.

I do have a site question:

  ENABLE_HASHJOIN (boolean)
  ENABLE_INDEXSCAN (boolean)
  ENABLE_MERGEJOIN (boolean)
  ENABLE_TIDSCAN (boolean)
All of the above, state that they are for debugging the query planner. 
 Does this mean that disabling these reduces debugging overhead and 
streamlines things?   The documentation is rather lacking for 
information on these.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Efficiency of timestamps

2003-07-08 Thread Martin Foster
Ignore='single'
 AND PuppetIgnore.PuppeteerLogin='root'
 AND PuppetIgnore.PuppetName=Post.PuppetName)
 OR Post.PuppetName IS NULL)
 ORDER BY Post.PostIDNumber LIMIT 100
-- Explain of Above--
Limit  (cost=0.00..87101.38 rows=100 width=48)
  InitPlan
->  Aggregate  (cost=12412.82..12412.82 rows=1 width=4)
  ->  Index Scan using idxpost_timestamp on post 
(cost=0.00..12282.42 rows=52160 width=4)
Index Cond: (posttimestamp > 
(('now'::text)::timestamp(6) without time zone - '00:10'::interval))
  ->  Index Scan using pkpost on post  (cost=0.00..1010992.25 rows=1161 
width=48)
Index Cond: (postidnumber > $0)
Filter: ((realmname = 'Amalgam'::character varying) AND 
((postto = 'all'::character varying) OR (postto = 'root'::character 
varying)) AND ((NOT (subplan)) OR (puppeteerlogin IS NULL)) AND ((NOT 
(subplan)) OR (puppetname IS NULL)))
SubPlan
  ->  Index Scan using pkpuppetignore on puppetignore 
(cost=0.00..13.31 rows=1 width=10)
Index Cond: (puppeteerlogin = 'root'::character varying)
Filter: ((puppetignore = 'global'::character varying) 
AND (puppetlogin = $1))
  ->  Index Scan using pkpuppetignore on puppetignore 
(cost=0.00..5.84 rows=1 width=15)
Index Cond: ((puppeteerlogin = 'root'::character 
varying) AND (puppetname = $2))
Filter: (puppetignore = 'single'::character varying)

Result : 18 rows fetched ( 0.04 sec)

Both PostIDNumber and PostTimestamp are indexed, so that should not be a 
bottleneck in itself.However, as you can see in the third example 
the use of a sub-query actually accelerates the process considerably, 
meaning that integer based searching is much much faster.

Under MySQL timestamps where in Unix time, which is why I may have never 
noticed such an extreme slowdown when doing similar on that script.   Of 
course to boggle the mind, here is a view that works very well:

CREATE VIEW ethereal.Who AS
  SELECT
Po.PuppetName AS PuppetName,
Po.PuppeteerLogin AS PuppeteerLogin,
Po.RealmName  AS RealmName,
Re.RealmPublicAS RealmPublic,
Re.RealmVerified  AS RealmVerified
  FROM ethereal.Post Po, ethereal.Puppet Ch, ethereal.Realm Re
  WHERE Po.PuppeteerLogin = Ch.PuppeteerLogin
AND Po.RealmName = Re.RealmName
AND Po.PostTimestamp > (LOCALTIMESTAMP - INTERVAL '10 minutes')
AND Po.PuppetName IS NOT NULL
  GROUP BY Po.PuppeteerLogin, Po.PuppetName, Po.RealmName, 
Re.RealmPublic, Re.RealmVerified
  ORDER BY Po.RealmName, Po.PuppetName;

Sort  (cost=309259.89..309629.34 rows=147780 width=79)
  Sort Key: po.realmname, po.puppetname
  ->  Group  (cost=270648.27..292815.19 rows=147780 width=79)
->  Sort  (cost=270648.27..274342.75 rows=1477795 width=79)
  Sort Key: po.puppeteerlogin, po.puppetname, po.realmname, 
re.realmpublic, re.realmverified
  ->  Merge Join  (cost=22181.60..41087.65 rows=1477795 
width=79)
Merge Cond: ("outer".puppeteerlogin = 
"inner".puppeteerlogin)
->  Sort  (cost=17172.82..17300.26 rows=50978 width=69)
  Sort Key: po.puppeteerlogin
  ->  Hash Join  (cost=12.41..13186.95 
rows=50978 width=69)
Hash Cond: ("outer".realmname = 
"inner".realmname)
->  Index Scan using idxpost_timestamp 
on post po  (cost=0.00..12282.42 rows=50978 width=42)
  Index Cond: (posttimestamp > 
(('now'::text)::timestamp(6) without time zone - '00:10'::interval))
  Filter: (puppetname IS NOT NULL)
->  Hash  (cost=11.93..11.93 rows=193 
width=27)
  ->  Seq Scan on realm re 
(cost=0.00..11.93 rows=193 width=27)
->  Sort  (cost=5008.78..5100.22 rows=36574 width=10)
  Sort Key: ch.puppeteerlogin
  ->  Seq Scan on puppet ch 
(cost=0.00..2236.74 rows=36574 width=10)

Result : 48 rows fetched ( 0.55 sec)

It uses the exact same time restraint as the first three examples, looks 
through the same table, does a tipple join and still gets off at higher 
speeds.   This seems to indicate that timestamps are actually efficient, 
which contradicts above examples.

Any ideas?   Code for the table creation is below signature:

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
--
--
-- NAME: Post
-- REFERENCES  : Realm*
--   Puppet*
--   PuppeteerLogin*
--
-- DESCRIPTION : Post is the hive of activity for all realms. 
Associated with all

Re: [PERFORM] Efficiency of timestamps

2003-07-08 Thread Martin Foster
Stephan Szabo wrote:


I think you might get better results with some kind of multi-column index.
It's using the index to avoid a sort it looks like, but it's not helping
to find the conditions. I can't remember the correct ordering, but maybe
(posttimestamp, realmname, postidnumber).  Having separate indexes on the
fields won't help currently since only one index will get chosen for the
scan.  Also, what does explain analyze show?
Hope that shed's light on the matter.

 Limit  (cost=0.00..260237.32 rows=100 width=48) (actual 
time=68810.26..68820.83 rows=55 loops=1)
   ->  Index Scan using pkpost on post  (cost=0.00..3020594.00 
rows=1161 width=48) (actual time=68810.25..68820.72 rows=55 loops=1)
 Filter: ((posttimestamp > (('now'::text)::timestamp(6) without 
time zone - '00:10'::interval)) AND (realmname = 'Amalgam'::character 
varying) AND ((postto = 'all'::character varying) OR (postto = 
'root'::character varying)) AND ((NOT (subplan)) OR (puppeteerlogin IS 
NULL)) AND ((NOT (subplan)) OR (puppetname IS NULL)))
 SubPlan
   ->  Index Scan using pkpuppetignore on puppetignore 
(cost=0.00..13.31 rows=1 width=10) (actual time=0.02..0.02 rows=0 loops=55)
 Index Cond: (puppeteerlogin = 'root'::character varying)
 Filter: ((puppetignore = 'global'::character varying) 
AND (puppetlogin = $0))
   ->  Index Scan using pkpuppetignore on puppetignore 
(cost=0.00..5.84 rows=1 width=15) (actual time=0.01..0.01 rows=0 loops=55)
 Index Cond: ((puppeteerlogin = 'root'::character 
varying) AND (puppetname = $1))
     Filter: (puppetignore = 'single'::character varying)
 Total runtime: 68821.11 msec

--
Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Efficiency of timestamps

2003-07-08 Thread Martin Foster
Stephan Szabo wrote:


The row estimate is high. How many rows meet the various conditions and
some of the combinations? And how many rows does it estimate if you do a
simpler query on those with explain?
I still think some variety of multi-column index to make the above index
conditions would help, but you'd probably need to play with which ones
help, and with the cost cut for the limit, I don't know if it'd actually
get a better plan, but it may be worth trying a bunch and seeing which
ones are useful and then dropping the rest.

At any given point in time you would not expect to see much more then 30 
posts applying for a time based search.That is primarily a result of 
having more then one room for which posts are attached to, and then some 
posts exist just to show people are there et cetera.

Simpler queries seem to do quiet well.   That view makes use of the same 
table and seems to have no performance impact from doing as such, and 
the position based search is considerably faster.

I can show EXPLAIN ANALYSE for all of those if you wish.

    Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Efficiency of timestamps

2003-07-08 Thread Martin Foster
Stephan Szabo wrote:
Well, the reason I asked is to see both whether the estimates for the
various columns were somewhere near reality (if not, then you may need to
raise the statistics target for the column) which might affect whether
it'd consider using a multi-column index for the conditions and sort
rather than the index scan it was using.
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster
I'm going to have to pull out the 'Practical PostgreSQL' book and brush 
up on optimizing.   This level of optimization is not something I have 
had to deal with in the past.

Also to make this interesting.  The sub-query method is faster at times 
and slower in others.   But doing two separate queries and working on 
the PostIDNumber field exclusively is always blazingly fast...

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-09 Thread Martin Foster
Scott Marlowe  wrote:
It would be nice to have a program that could run on any OS postgresql 
runs on and could report on the current limits of the kernel, and make 
recommendations for changes the admin might want to make.

One could probably make a good stab at effective cache size during 
install.  Anything reasonably close would probably help.

Report what % of said resources could be consumed by postgresql under 
various circumstances...

One of the issues that automating the process would encounter are limits 
in the kernel that are too low for PostgreSQL to handle. The BSD's come 
to mind where they need values manually increased in the kernel before 
you can reach a reasonable maximum connection count.

Another example is how OpenBSD will outright crash when trying to test 
the database during install time.   It seems that most of the tests fail 
because the maximum amount of processes allowed is too low for the test 
to succeed.   While FreeBSD will work just fine on those same tests.

If PostgreSQL automates the configuration, that would be a plus.   But 
also detect the platform and inform the person that these changes should 
be done to the kernel, sysctl or whatever in order to have that 
configuration run.

Perl may be useful in this for a few reasons.   It's portable enough to 
run on multiple Unix variants and the tools would be fairly standard, so 
the code would require less considerations for more exotic implementations.

    Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-07-09 Thread Martin Foster
Sean Chittenden wrote:
I looked through the src/doc/runtime.sgml for a good place to stick
this and couldn't find a place that this seemed appropriate, but on
FreeBSD, this can be determined with a great deal of precision in a
programmatic manner:
echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))"

The same OID is available via C too.  It'd be slick if PostgreSQL
could tune itself (on FreeBSD) at initdb time with the above code.  If
Linux exports this info via /proc and can whip out the appropriate
magic, even better.  An uncommented out good guess that shows up in
postgresql.conf would be stellar and quite possible with the use of
sed.
Maybe an initdb switch could be added to have initdb tune the config
it generates?  If a -n is added, have it generate a config and toss it
to stdout?
case `uname` in
"FreeBSD")
echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))"
;;
*)
echo "Unable to automatically determine the effective cache size" >> 
/dev/stderr
;;
esac
-sc

Simplest way may be to create a 'auto-tune' directory with scripts for 
configured platforms.   When postgres installs the databases, it checks 
for 'tune.xxx' and if found uses that to generate the script itself?

This would allow for defaults on platforms that do not have them and 
optimization for those that do.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [NOVICE] [PERFORM] Extreme high load averages

2003-07-09 Thread Martin Foster
Dennis Björklund wrote:

On Sun, 6 Jul 2003, Martin Foster wrote:


The processor seems to be purposely sitting there twiddling it's thumbs. 
 Which leads me to believe that perhaps the nice levels have to be 
changed on the server itself?


It could also be all the usual things that affect performance. Are your 
queries using indexes where it should? Do you vacuum analyze after you 
have updated/inserted a lot of data?

It could be that some of your queries is not as efficient as it should, 
like doing a sequenctial scan over a table instead of an index scan. That 
translates into more IO needed and slower response times. Especially when 
you have more connections figthing for the available IO.

I actually got a bit more respect for PostgreSQL tonight.  It seems that 
one of my scripts was not committing changes after maintenance was 
conducted.  Meaning that rows that would normally be removed after 
offline archiving was completed were in fact still around.

Normally at any given point in time this table would grow 50K rows 
during a day, be archived that night and then loose rows that were no 
longer needed.This process, is what allowed MySQL to maintain any 
stability as the size of this table can balloon significantly.

PostgreSQL with tweaking was handling a table with nearly 300K rows. 
That size alone would of dragged the MySQL system down to a near grind, 
and since most of those rows are not needed.   One can imagine that 
queries are needlessly processing rows that should be outright ignored.

This probably explains why row numbering based searches greatly 
accelerated the overall process.

By fixing the script and doing the appropriate full vacuum and re-index, 
the system is behaving much more like it should.  Even if the process 
may seem a bit odd to some.

The reason for removing rows on a daily basis is due to the perishable 
nature of the information.  Since this is a chat site, posts over a day 
old are rarely needed for any reason.   Which is why they are archived 
into dumps in case we really need to retrieve the information itself and 
this gives us the added bonus of smaller backup sizes and smaller 
database sizes.

    Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [NOVICE] [PERFORM] Extreme high load averages

2003-07-10 Thread Martin Foster
Shridhar Daithankar wrote:


I have an idea.

How about creating a table for each day. Use it for a while and rename it. 
Since you can rename a table in transaction, it should not be a problem.

You can use inheritance if you want to query all of them. Using indexes and 
foregin keys on inherited tables is a problem though.

That way deletion would be avoided and so would vacuum. It should be mich 
lighter on the system overall as well.

Tell us if it works.

Bye
 Shridhar


Generally I won't be pulling 250K rows from that table.   It's 
maintained nightly during the general cleanup process where stale users, 
rooms and posts are removed from the system.   Then the system runs a 
normal VACUUM ANALYSE to get things going again smoothly.

Once a week a more detailed archiving takes place which runs an all out 
vaccume and re-index.That's the so called plan at least.

As for creating a new table, that in itself is a nice idea.   But it 
would cause issues for people currently in the realm.   Their posts 
would essentially dissapear from site and cause more confusion then its 
worth.

Inheritance would work, but the database would essentially just grow and 
grow and grow right?

BTW, I can't thank you all enough for this general advice.   It's 
helping me get this thing running very smoothly.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [NOVICE] [PERFORM] Extreme high load averages

2003-07-11 Thread Martin Foster
Shridhar Daithankar wrote:

On 10 Jul 2003 at 0:43, Martin Foster wrote:

As for creating a new table, that in itself is a nice idea.   But it 
would cause issues for people currently in the realm.   Their posts 
would essentially dissapear from site and cause more confusion then its 
worth.


No they won't. Say you have a base table and your current post table is child 
of that. You can query on base table and get rows from child table. That way 
all the data would always be there.

While inserting posts, you would insert in child table. While qeurying you 
would query on base table. That way things will be optimal.


Inheritance would work, but the database would essentially just grow and 
grow and grow right?


Right. But there are two advantages.

1. It will always contain valid posts. No dead tuples.
2. You can work in chuncks of data. Each child table can be dealt with 
separately without affecting other child tables, whereas in case of a single 
large table, entire site is affected..

Deleting 100K posts from 101K rows table is vastly different than deleting 10K 
posts from 2M rows table. Later one would unnecessary starve the table with 
dead tuples and IO whereas in former case you can do create table as select 
from and drop the original..

HTH

Bye
 Shridhar
--
"[In 'Doctor' mode], I spent a good ten minutes telling Emacs what Ithought of 
it.  (The response was, 'Perhaps you could try to be lessabusive.')"(By Matt 
Welsh)

When I ran EXPLAIN on the views and queries making use of the inherited 
tables, I noticed that everything worked based on sequence scans and it 
avoided all indexes.While making use of ONLY kicked in full indexes.

This is even after having created a child table with the same indexes as 
the parent.   Is this a known issue, or just some sort of oddity on my 
setup?

Tables still cannot be removed easily, but I found a way to work around 
it for a day-to-day basis.  Essentailly I just clean out the tables 
containing old rows and delete them later.  However based on the above, 
I doubt performance would get any better.

Thanks for the advice however!

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [NOVICE] Optimizer Parameters

2003-07-12 Thread Martin Foster
Tom Lane wrote:

force Postgres into using Indexes when available.So I changed the 
following two lines in the .conf file:


   enable_seqscan = false
   enable_nestloop = false
>This was recommended in the documentation,

Where would you say that setting those off in the config file is
"recommended"?

Now how sane is it to keep those options turned off?


It isn't.  If you have to force them off for a particular query, do
so right before you issue that query, and turn them on again after.
Turning them off globally is sure to cause you pain later.

And any 
way to have the planner quiet guessing tens of thousands of rows will be 
return when there are at most hundred?


   AND Po.PostTimestamp > (LOCALTIMESTAMP - INTERVAL '10 minutes')
   AND Po.PuppetName IS NOT NULL


  ->  Seq Scan on post po  (cost=0.00..14369.84 rows=40513 
width=41) (actual time=2820.88..2826.30 rows=392 loops=1)
Filter: ((posttimestamp > (('now'::text)::timestamp(6) 
without time zone - '00:10'::interval)) AND (puppetname IS NOT NULL))


Not with that coding technique; "LOCALTIMESTAMP - INTERVAL '10 minutes'"
isn't a constant and so the planner can't look at its statistics to
see that only a small part of the table will be selected.
There are two standard workarounds for this:

1. Do the timestamp arithmetic on the client side, so that the query
you send the backend has a simple constant:
... AND Po.PostTimestamp > '2003-07-12 16:27'

2. Create a function that is falsely marked immutable, viz:

create function ago(interval) returns timestamp without time zone as
'select localtimestamp - $1' language sql immutable strict;
... AND Po.PostTimestamp > ago('10 minutes')

Because the function is marked immutable, the planner will reduce
"ago('10 minutes')" to a constant on sight, and then use that value
for planning purposes.  This technique can cause problems, since
in some contexts the reduction will occur prematurely, but as long
as you only use ago() in interactively-issued queries it works okay.
			regards, tom lane
http://www.postgresql.org/docs/7.3/static/indexes-examine.html

The conf file does not make a mention of it, other then perhaps being 
used to debug.  The above link points to disabling it, but tells you 
nothing about potential consequences and what to do if it works better 
then it did before.

However, when I tried out your functions things started to work much 
better then previously.   This to say the least is a great sign as it 
will increase overall performance.

So thanks for that!   As a side note, would you recommend disabling 
fsync for added performance?   This would be joined with a healthy dose 
of a kernel file system buffer.

Simply curious, as I have been increasing certain options for the WAL to 
mean it writes less often (transactions are numerous so that's not an 
issue) to the hard drives.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


[PERFORM] Clearing rows periodically

2003-07-17 Thread Martin Foster
I have two tables in the database which are used almost every time
someone makes use of the webpages themselves.   The first, is some sort
of database side parameter list which stores parameters from session to
session.  While the other, is a table that handles the posting activity
of all the rooms and chatters.
The first is required in all authentication with the system and when
entries are missing you are challenged by the system to prove your
identity.   This table is based on a randomized order, as in the unique
number changes pseudo randomly and this table sees a reduction in
entries every hour on the hour as to keep it's information fresh and
manageable.
The other table follows a sequential order and carries more columns of
information.   However, this table clears it's entry nightly and with
current settings will delete roughly a days traffic sitting at 50K rows
of information.
The difference is as follows:   Without making the use of vacuum every
hour the parameter table performs very well, showing no loss in service
or degradation.Since people authenticate more then post, it is safe
to assume that it removes more rows daily then the posting table.
The posting table often drags the system down in performance when a day
has been skipped, which includes the use of VACUUM ANALYZE EXPLAIN.
This seems to be an indication that the process of a daily delete is
actually a very wise step to take, even if the information itself is not
needed for very long.
A VACUUM FULL will correct the issue, but put the site out of commission
for roughly 20 minutes as the drive crunches the information.
My question is, should the purging of rows be done more often then once
a day for both tables.   Is this why performance seems to take a hit
specifically?  As there were too many rows purged for vacuum to
accurately keep track of?
    Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match