Re: [PERFORM] can't handle large number of INSERT/UPDATEs

2004-10-30 Thread Markus Bertheau
Turns out the man page of vmstat in procps was changed on Oct 8 2002:

http://cvs.sourceforge.net/viewcvs.py/procps/procps/vmstat.8?r1=1.1&r2=1.2

in reaction to a debian bug report:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=157935

-- 
Markus Bertheau <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] futex results with dbt-3

2004-10-30 Thread Manfred Spraul
Tom Lane wrote:
It could be that I'm all wet and there is no relationship between the
cache line thrashing and the seemingly excessive BufMgrLock contention.
 

Is it important? The fix is identical in both cases: per-bucket locks 
for the hash table and a buffer aging strategy that doesn't need one 
global lock that must be acquired for every lookup.

--
   Manfred
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] psql large RSS (1.6GB)

2004-10-30 Thread TTK Ciar
Hello!

  My name is TTK, and I'm a software engineer at the Internet Archive's 
Data Repository department.  We have recently started using postgresql 
for a couple of projects (we have historically been a MySQL outfit), 
and today my co-worker noticed psql eating memory like mad when invoked 
with a simple select statement incorporating a join of two tables.

  The platform is a heavily modified RedHat 7.3 Linux.  We are using 
version 7.4.5 of postgresql.

  The invocation was via sh script:

#!/bin/bash

outfile=$1
if [ -z "$outfile" ]; then
outfile=/0/brad/all_arcs.txt
fi

/usr/lib/postgresql/bin/psql -c 'select 
ServerDisks.servername,ServerDisks.diskserial,ServerDisks.diskmountpoint,DiskFiles.name,DiskFiles.md5
 from DiskFiles,ServerDisks where DiskFiles.diskserial=ServerDisks.diskserial;' -F ' ' 
-A -t -o $outfile

.. and the tables in question are somewhat large (hundreds of GB's 
of data), though we didn't expect that to be an issue as far as the 
psql process was concerned.

  We monitored server load via 'top -i -d 0.5' and watched the output 
file for data.  Over the course of about 200 seconds, psql's RSS 
climbed to about 1.6 GB, and stayed there, while no data was written 
to the output file.  Eventually 10133194 lines were written to the 
output file, all at once, about 1.2GB's worth of data.

  I re-ran the select query using psql in interactive mode, and saw 
the same results.

  I re-ran it again, using "explain analyse", and this time psql's 
RSS did *not* increase significantly.  The result is here, if it 
helps:

brad=# explain analyse select 
ServerDisks.servername,ServerDisks.diskserial,ServerDisks.diskmountpoint,DiskFiles.name,DiskFiles.md5
 from DiskFiles,ServerDisks where DiskFiles.diskserial=ServerDisks.diskserial;
  QUERY PLAN   
  
--
 Hash Join  (cost=22.50..65.00 rows=1000 width=274) (actual time=118.584..124653.729 
rows=10133349 loops=1)
   Hash Cond: (("outer".diskserial)::text = ("inner".diskserial)::text)
   ->  Seq Scan on diskfiles  (cost=0.00..20.00 rows=1000 width=198) (actual 
time=7.201..31336.063 rows=10133349 loops=1)
   ->  Hash  (cost=20.00..20.00 rows=1000 width=158) (actual time=90.821..90.821 
rows=0 loops=1)
 ->  Seq Scan on serverdisks  (cost=0.00..20.00 rows=1000 width=158) (actual 
time=9.985..87.364 rows=2280 loops=1)
 Total runtime: 130944.586 ms

  At a guess, it looks like the data set is being buffered in its 
entirety by psql, before any data is written to the output file, 
which is surprising.  I would have expected it to grab data as it 
appeared on the socket from postmaster and write it to disk.  Is 
there something we can do to stop psql from buffering results? 
Does anyone know what's going on here?

  If the solution is to just write a little client that uses perl 
DBI to fetch rows one at a time and write them out, that's doable, 
but it would be nice if psql could be made to "just work" without 
the monster RSS.

  I'd appreciate any feedback.  If you need any additional info, 
please let me know and I will provide it.

  -- TTK
  [EMAIL PROTECTED]
  [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] psql large RSS (1.6GB)

2004-10-30 Thread Dustin Sallings
On Oct 27, 2004, at 0:57, TTK Ciar wrote:
  At a guess, it looks like the data set is being buffered in its
entirety by psql, before any data is written to the output file,
which is surprising.  I would have expected it to grab data as it
appeared on the socket from postmaster and write it to disk.  Is
there something we can do to stop psql from buffering results?
Does anyone know what's going on here?
	Yes, the result set is sent back to the client before it can be used.  
An easy workaround when dealing with this much data is to use a cursor. 
 Something like this:

db# start transaction;
START TRANSACTION
db# declare logcur cursor for select * from some_table;
DECLARE CURSOR
db# fetch 5 in logcur;
[...]
(5 rows)
	This will do approximately what you expected the select to do in the 
first, place, but the fetch will decide how many rows to buffer into 
the client at a time.

  If the solution is to just write a little client that uses perl
DBI to fetch rows one at a time and write them out, that's doable,
but it would be nice if psql could be made to "just work" without
the monster RSS.
	It wouldn't make a difference unless that driver implements the 
underlying protocol on its own.

--
SPY  My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <[EMAIL PROTECTED]>
|Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L___ I hope the answer won't upset her. 
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly