Re: [HACKERS] prefix btree implementation

2005-10-05 Thread Tom Lane
Qingqing Zhou [EMAIL PROTECTED] writes:
 1/ What types of prefix compression shall we support?

Given the requirement of datatype independence, this idea seems a
complete nonstarter to me...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [pgadmin-hackers] pgAdmin guru hints

2005-10-05 Thread Bastiaan Wakkie




I noticed a typo in hints/conn-hba.html

The second internal ip adres missed a '.'



You're invited to make your comments on the hints: are these correct, is 
there something missing or misleading?



Cheers,




-- 
^(B(astia{2}n)?)(\s)?(W(ak{2}ie)?)$








Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Michael Stone

On Mon, Oct 03, 2005 at 01:34:01PM -0700, Josh Berkus wrote:

Realistically, you can't do better than about 25MB/s on a
 single-threaded I/O on current Linux machines,

What on earth gives you that idea? Did you drop a zero?


Nope, LOTS of testing, at OSDL, GreenPlum and Sun.   For comparison, A 
Big-Name Proprietary Database doesn't get much more than that either.


You seem to be talking about database IO, which isn't what you said.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Vacuum and Transactions

2005-10-05 Thread Hannu Krosing
On T, 2005-10-04 at 11:10 -0400, Tom Lane wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
  The catch is that there are some other very active structures (like
  pg_listener for Slony) which after a couple of hours without vacuuming
  will quickly have the DB at an unreasonably high load (low tens) which
  seems to all but halt the vacuum on the large structure.
 
 Yeah.  We desperately need to reimplement listen/notify :-( ... that
 code was never designed to handle high event rates.

Sure. But it handles amazingly well event rates up to a few hundred
events per second - given that pg_listener is cleaned up often enough.
Above a few hundred eps it starts geting stuck on locks.

It also seems that Slony can be modified to not use LISTEN/NOTIFY in
high load situations (akin to high performance network cards, which
switch from interrupt driven mode to polling mode if number of packets
per second reaches certain thresolds).

Unfortunately Slony and Listen/Notify is not the only place where high-
update rate tables start to suffer from vacuums inability to clean out
dead tuples when working in parallel with other slower vacuums. In real
life there are other database tasks which also need some tables to stay
small, while others must be huge in order to work effectively. Putting
small and big tables in different databases and using dblink-like
functionality when accessing them is one solution for such cases, but it
is rather ugly :(

-- 
Hannu Krosing [EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Vacuum and Transactions

2005-10-05 Thread Hannu Krosing
On T, 2005-10-04 at 00:26 -0400, Rod Taylor wrote:
 As I understand it vacuum operates outside of the regular transaction
 and if you stop it (SIGTERM, or pg_cancel_backend()) some of the work it
 accomplished will be kept when it rolls back.
 
 For large structures with a ton of dead entries (which I seem to have a
 case), running vacuum takes long enough that high-churn structures begin
 to experience difficulties.
 
 Is it reasonable to cancel and restart the vacuum process periodically
 (say every 12 hours) until it manages to complete the work? It takes
 about 2 hours to do the table scan, and should get in about 10 hours of
 index work each round.

It seems that the actual work done by LAZY VACUUM is not rolled back
when you kill the backend doing the vacuum (though VACUUM is quite hart
to kill, and may require KILL -9 to accomplis, with all the downsides of
kill -9).

So, yes, as a last resort you can kill VACUUM (or rather limit its
lifetime by set statement_timeout = XXX) and get some work done in
each run. It only makes sense if the timeout is big enough for vacuum to
complete the first scan (collect dead tuples) over the heap and then do
some actual work. For table with 3 indexes the timeout must be at least
(1.st heap scan + 3 indexscans with no work + some portion of 2nd
(cleanuout) heap scan )  to ever get the table completely cleaned up.

 The vacuum ignores vacuum transaction concept looks handy right now.

There is a patch for 8.1 in PATCHES list (postponed to 8.2 :( ). This
can be backported to 8.0 quite easily.

-- 
Hannu Krosing [EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Vacuum and Transactions

2005-10-05 Thread Zeugswetter Andreas DAZ SD

  Is it reasonable to cancel and restart the vacuum process
periodically 
  (say every 12 hours) until it manages to complete the work? It takes

  about 2 hours to do the table scan, and should get in about 10 hours

  of index work each round.

If we started the vacuum with the indexes, remembered a lowest xid per
index,
we could then vacuum the heap up to the lowest of those xids, no ?
We could then also vacuum each index separately.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Slony RPM issue

2005-10-05 Thread Philip Yarra
Hi Devrim, I ran into another RPM issue, this time with Slony.

I grabbed the RPM from 
http://developer.postgresql.org/~devrim/slony/1.1.0/rpms/PG8.0.3/

Trying to run slon_start, I got errors such as:
$ slon_start --config /etc/slon_tools.conf 2
Invoke slon for node 2 - @@@/slon -s 1000 -d2 -g 80 replication 'host=dev2 
dbname=booktown user=postgres port=5432' 21 
 /var/log/slony/slony1/node2/booktown-2005-10-05_16:23:53.log 
sh: line 1: @@@/slon: No such file or directory
Slon failed to start for cluster replication, node node2

This turned out to be because /usr/lib/pgsql/slon-tools.pm contained several 
un-expanded @@PGBINDIR@@ prefixes - once I replaced them with /usr/bin it all 
ran just fine.

Please let me know if I can assist in fixing these issues.

Also, thanks to David and Gavin for their help on IRC - it made a big 
difference to my day!

Regards, Philip.

-- 

Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it. - Brian W. Kernighan

-
Utiba Pty Ltd 
This message has been scanned for viruses and
dangerous content by Utiba mail server and is 
believed to be clean.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] Query in SQL statement

2005-10-05 Thread R, Rajesh (STSD)

Thanks. 
I've already understood that 
I need to post it in another list.

Sorry for wasting your precious time. 

--
Rajesh R

-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 05, 2005 2:24 PM
To: R, Rajesh (STSD)
Cc: pgsql-hackers@postgresql.org; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query in SQL statement

R, Rajesh (STSD) wrote:
  
 Am trying to port a mysql statement to postgres.
 
 Please help me in finding the error in this,

Can I recommend the reference section of the manuals for this sort of
thing? There is an excellent section detailing the valid SQL for the
CREATE TABLE command.

Also - the pgsql-hackers list is for discussion of database development,
and the performance list is for performance problems. This would be
better posted on pgsql-general or -sql or -novice.

 CREATE SEQUENCE ai_id;

This line is causing the first error:
  ERROR:  relation ai_id already exists

That's because you've already successfully created the sequence, so it
already exists. Either drop it and recreate it, or stop trying to
recreate it.

 CREATE TABLE badusers (
   id int DEFAULT nextval('ai_id') NOT NULL,
   UserName varchar(30),
   Date  datetime DEFAULT '-00-00 00:00:00' NOT NULL,

Well, Date is a type-name, datetime isn't and even if it was
-00-00 isn't a valid date is it?

   Reason varchar(200),
   Admin varchar(30) DEFAULT '-',
   PRIMARY KEY (id),
   KEY UserName (UserName),
   KEY Date (Date)

The word KEY isn't valid here either - are you trying to define an
index? If so, see the CREATE INDEX section of the SQL reference.

http://www.postgresql.org/docs/8.0/static/sql-commands.html

If you reply to this message, please remove the pgsql-hackers CC:
--
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] Query in SQL statement

2005-10-05 Thread Richard Huxton

R, Rajesh (STSD) wrote:
 
Am trying to port a mysql statement to postgres.


Please help me in finding the error in this,


Can I recommend the reference section of the manuals for this sort of 
thing? There is an excellent section detailing the valid SQL for the 
CREATE TABLE command.


Also - the pgsql-hackers list is for discussion of database development, 
and the performance list is for performance problems. This would be 
better posted on pgsql-general or -sql or -novice.



CREATE SEQUENCE ai_id;


This line is causing the first error:
 ERROR:  relation ai_id already exists

That's because you've already successfully created the sequence, so it 
already exists. Either drop it and recreate it, or stop trying to 
recreate it.



CREATE TABLE badusers (
  id int DEFAULT nextval('ai_id') NOT NULL,
  UserName varchar(30),
  Date  datetime DEFAULT '-00-00 00:00:00' NOT NULL,


Well, Date is a type-name, datetime isn't and even if it was 
-00-00 isn't a valid date is it?



  Reason varchar(200),
  Admin varchar(30) DEFAULT '-',
  PRIMARY KEY (id),
  KEY UserName (UserName),
  KEY Date (Date)


The word KEY isn't valid here either - are you trying to define an 
index? If so, see the CREATE INDEX section of the SQL reference.


http://www.postgresql.org/docs/8.0/static/sql-commands.html

If you reply to this message, please remove the pgsql-hackers CC:
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Martijn van Oosterhout
On Wed, Oct 05, 2005 at 05:41:25AM -0400, Michael Stone wrote:
 On Sat, Oct 01, 2005 at 06:19:41PM +0200, Martijn van Oosterhout wrote:
 COPY TO /dev/null WITH binary
 13MB/s55% user 45% system  (ergo, CPU bound)
 [snip]
 the most expensive. But it does point out that the whole process is
 probably CPU bound more than anything else.
 
 Note that 45% of that cpu usage is system--which is where IO overhead
 would end up being counted. Until you profile where you system time is
 going it's premature to say it isn't an IO problem.

It's a dual CPU system, so 50% is the limit for a single process. Since
system usage  user, PostgreSQL is the limiter. Sure, the system is
taking a lot of time, but PostgreSQL is still the limiting factor.

Anyway, the later measurements using gprof exclude system time
altogether and it still shows CPU being the limiting factor. Fact is,
extracting tuples from pages is expensive.
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpP683jRzgfx.pgp
Description: PGP signature


Re: [HACKERS] Vacuum and Transactions

2005-10-05 Thread Gaetano Mendola
Rod Taylor wrote:
 I have maintenace_work_mem set to about 1GB in size.

Isn't a bit too much ?


Regards
Gaetano Mendola



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] prefix btree implementation

2005-10-05 Thread Alvaro Herrera
On Wed, Oct 05, 2005 at 12:50:41AM -0400, Tom Lane wrote:
 Qingqing Zhou [EMAIL PROTECTED] writes:
  1/ What types of prefix compression shall we support?
 
 Given the requirement of datatype independence, this idea seems a
 complete nonstarter to me...

How about having each type optionally provide the required routines?
Thus we could provide them at least for the most common datatypes, and
the system would continue working as currently for the rest (including
user-defined types).  Cross-column prefixes would be hard to handle I
guess, as well as TOASTed data.

One problem I do see is what happens if I need to insert a new tuple in
the page that doesn't share the prefix.  It obviously would have to be
the leftmost or rightmost item on the page, but it's possible.

-- 
Alvaro Herrerahttp://www.PlanetPostgreSQL.org
A male gynecologist is like an auto mechanic who never owned a car.
(Carrie Snow)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Vacuum and Transactions

2005-10-05 Thread Rod Taylor
  The vacuum ignores vacuum transaction concept looks handy right now.
 
 There is a patch for 8.1 in PATCHES list (postponed to 8.2 :( ). This
 can be backported to 8.0 quite easily.

Understood. I've seen them, but until they're well tested in the newest
version I won't be using them in a production environment.

I do appreciate the goal and look forward to this concept being applied
or a method of splitting up the work vacuum needs to do, in the future.

-- 


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Vacuum and Transactions

2005-10-05 Thread Rod Taylor
On Wed, 2005-10-05 at 09:53 +0300, Hannu Krosing wrote:
 On T, 2005-10-04 at 11:10 -0400, Tom Lane wrote:
  Rod Taylor [EMAIL PROTECTED] writes:
   The catch is that there are some other very active structures (like
   pg_listener for Slony) which after a couple of hours without vacuuming
   will quickly have the DB at an unreasonably high load (low tens) which
   seems to all but halt the vacuum on the large structure.
  
  Yeah.  We desperately need to reimplement listen/notify :-( ... that
  code was never designed to handle high event rates.
 
 Sure. But it handles amazingly well event rates up to a few hundred
 events per second - given that pg_listener is cleaned up often enough.

Accomplishing the pg_listener cleanup often enough can be difficult in
some circumstances.

 It also seems that Slony can be modified to not use LISTEN/NOTIFY in
 high load situations (akin to high performance network cards, which
 switch from interrupt driven mode to polling mode if number of packets
 per second reaches certain thresolds).

I have other items in this database with high churn as well. Slony was
just an example.

-- 


---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] current_user versus current_role

2005-10-05 Thread Pavel Stehule

hello

I used info from current_user for log. about some operations (who, when, 
..). What I can see, current_user is equal current_role function. I had 
problem with it, because user (if is member of any group role) can change 
his identity. example: peter is member of role users. But peter can do set 
role to users. From this moment I lost possibility of get identity if user, 
because current_user returns users and not peter. I can check it (if some 
user changed identity) ~ by exampl. test if role has login privilegia, and 
if not I can stop any operations.


But Is there some possibility get identity of user without impact of change 
of role? (default role?)


thank you
Pavel Stehule

_
Chcete sdilet sve obrazky a hudbu s prateli? http://messenger.msn.cz/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] current_user versus current_role

2005-10-05 Thread Alvaro Herrera
On Wed, Oct 05, 2005 at 03:17:25PM +0200, Pavel Stehule wrote:

Hi,

 I used info from current_user for log. about some operations (who, when, 
 ..). What I can see, current_user is equal current_role function. I had 
 problem with it, because user (if is member of any group role) can change 
 his identity. example: peter is member of role users. But peter can do set 
 role to users. From this moment I lost possibility of get identity if user, 
 because current_user returns users and not peter. I can check it (if some 
 user changed identity) ~ by exampl. test if role has login privilegia, and 
 if not I can stop any operations.

I believe you can use session_user for this.

-- 
Alvaro Herrera   Developer, http://www.PostgreSQL.org
Amanece.   (Ignacio Reyes)
 El Cerro San Cristóbal me mira, cínicamente, con ojos de virgen

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Michael Stone

On Sat, Oct 01, 2005 at 06:19:41PM +0200, Martijn van Oosterhout wrote:

COPY TO /dev/null WITH binary
13MB/s55% user 45% system  (ergo, CPU bound)

[snip]

the most expensive. But it does point out that the whole process is
probably CPU bound more than anything else.


Note that 45% of that cpu usage is system--which is where IO overhead
would end up being counted. Until you profile where you system time is
going it's premature to say it isn't an IO problem.

Mike Stone


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Michael Stone

On Tue, Oct 04, 2005 at 12:43:10AM +0300, Hannu Krosing wrote:

Just FYI, I run a count(*) on a 15.6GB table on a lightly loaded db and
it run in 163 sec. (Dual opteron 2.6GHz, 6GB RAM, 6 x 74GB 15k  disks in
RAID10, reiserfs). A little less than 100MB sec.


And none of that 15G table is in the 6G RAM?

Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Interesting optimizer's supposition in 8.1

2005-10-05 Thread Teodor Sigaev



Tsearch2 has function to_tsquery defined as:
CREATE FUNCTION to_tsquery(oid, text)
RETURNS tsquery
AS '$libdir/tsearch2'
LANGUAGE 'c' with (isstrict,iscachable);

And let we take 2 essential equivalent queries:

# explain select book.id from to_tsquery('foo') as t, book where book.fts @@ t;
 QUERY PLAN

 Nested Loop  (cost=13.19..6550.69 rows=290 width=4)
   Join Filter: (inner.fts @@ outer.t)
   -  Function Scan on t  (cost=0.00..12.50 rows=1000 width=32)
   -  Materialize  (cost=13.19..16.09 rows=290 width=36)
 -  Seq Scan on book  (cost=0.00..12.90 rows=290 width=36)
# explain select book.id from book where book.fts @@ to_tsquery('foo');
 QUERY PLAN
-
 Seq Scan on book  (cost=0.00..13.62 rows=1 width=4)
   Filter: (fts @@ '''foo'''::tsquery)

Why planner suppose that t 'table' will return 1000 rows? Obviosly that function 
returns only one value because of itsn't marked as 'returns setof'.









--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Slony RPM issue

2005-10-05 Thread Devrim GUNDUZ


Hi,

On Wed, 5 Oct 2005, Philip Yarra wrote:


Hi Devrim, I ran into another RPM issue, this time with Slony.


:-)


I grabbed the RPM from
http://developer.postgresql.org/~devrim/slony/1.1.0/rpms/PG8.0.3/

Trying to run slon_start, I got errors such as:
$ slon_start --config /etc/slon_tools.conf 2
Invoke slon for node 2 - @@@/slon -s 1000 -d2 -g 80 replication 'host=dev2
dbname=booktown user=postgres port=5432' 21

/var/log/slony/slony1/node2/booktown-2005-10-05_16:23:53.log 

sh: line 1: @@@/slon: No such file or directory
Slon failed to start for cluster replication, node node2

This turned out to be because /usr/lib/pgsql/slon-tools.pm contained several
un-expanded @@PGBINDIR@@ prefixes - once I replaced them with /usr/bin it all
ran just fine.


Hrrm.

It seems that I've installed wrong file in the spec :( However slon-tools 
file is not updated by the Makefile, so I'll try to fix that, too.


Thanks for the report. It will fixed in CVS and all the RPM sets later 
today. Always feel free to send me a patch if you want, I can apply your 
patch, too.


BTW, could you please report the possible Slony problems to 
slony1-general@gborg.postgresql.org later?


Regards,
--
Devrim GUNDUZ
Kivi Bilişim Teknolojileri - http://www.kivi.com.tr
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
  http://www.gunduz.org
---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] prefix btree implementation

2005-10-05 Thread Bricklen Anderson
Qingqing Zhou wrote:
 I am not sure if this idea was mentioned before.
 
 The basic prefix btree idea is quite straightforward, i.e., try to
 compress the key items within a data page by sharing the common prefix.
 Thus the fanout of the page is increased and the benefits is obvious
 theorectically.
 
snip
 
 So together, there are basically four types of possible sharing:
 column-wise (case 1), character-wise (case 2), column-character-wise (case
 3), and byte-wise (case 4).
 
Oracle implements something similar called index compression, but I believe it
is only for common column values. I haven't checked in versions9r1 so maybe
there are other options implemented by now.

Jonathan Lewis describes some pros and cons here:
http://www.jlcomp.demon.co.uk/faq/compress_ind.html

-- 
___

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
___

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Luke Lonergan
Nope - it would be disk wait.

COPY is CPU bound on I/O subsystems faster that 50 MB/s on COPY (in) and about 
15 MB/s (out).

- Luke

 -Original Message-
From:   Michael Stone [mailto:[EMAIL PROTECTED]
Sent:   Wed Oct 05 09:58:41 2005
To: Martijn van Oosterhout
Cc: pgsql-hackers@postgresql.org; pgsql-performance@postgresql.org
Subject:Re: [HACKERS] [PERFORM] A Better External Sort?

On Sat, Oct 01, 2005 at 06:19:41PM +0200, Martijn van Oosterhout wrote:
COPY TO /dev/null WITH binary
13MB/s55% user 45% system  (ergo, CPU bound)
[snip]
the most expensive. But it does point out that the whole process is
probably CPU bound more than anything else.

Note that 45% of that cpu usage is system--which is where IO overhead
would end up being counted. Until you profile where you system time is
going it's premature to say it isn't an IO problem.

Mike Stone


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Announcing Veil

2005-10-05 Thread Marc Munro
Tom,
Thanks for your reponse.  Unless I am missing your point, to add more
locks we require a minor code change to the postgres server.  I am happy
to submit a patch but this will not help Veil work with existing
versions of Postgres.  I am aiming for compatibility with 7.4 onward.
Your views on this would be appreciated.

Assuming that simply allocating a few extra LWLocks for user-defined
functions is acceptable, here are some patches:

--cut---
*** ./src/backend/storage/lmgr/lwlock.c Sat Aug 20 16:26:24 2005
--- lwlock.cWed Oct  5 08:20:31 2005
***
*** 120,126 
 */
numLocks += 2 * NUM_SLRU_BUFFERS;

!   /* Perhaps create a few more for use by user-defined modules? */

return numLocks;
  }
--- 120,127 
 */
numLocks += 2 * NUM_SLRU_BUFFERS;

!   /* Create a few more for use by user-defined modules. */
!   numLocks += NUM_USER_DEFINED_LWLOCKS;

return numLocks;
  }
--cut---
*** ./src/include/storage/lwlock.h  Sat Aug 20 16:26:34 2005
--- lwlock.hWed Oct  5 08:22:26 2005
***
*** 53,58 
--- 53,62 
MaxDynamicLWLock = 10
  } LWLockId;
 
+ /*
+  * Allocate a few LWLocks for user-defined functions.
+  */
+ #define NUM_USER_DEFINED_LWLOCKS 4

  typedef enum LWLockMode
  {
--cut---


__
Marc Munro

On Tue, 2005-10-04 at 22:51 -0400, Tom Lane wrote:
 Marc Munro [EMAIL PROTECTED] writes:
  Since I was unable to dynamically assign a LWLock using
  LWLockAssign (none available), I have fairly arbitrarily overloaded the
  use of existing LWLocks.  When the flames die down perhaps we can
  discuss making a small number (one would be enough for me) of LWLocks
  available.
 
 Perhaps you missed the comment in NumLWLocks()?
 
   regards, tom lane


signature.asc
Description: This is a digitally signed message part


[HACKERS] wrong optimization ( postgres 8.0.3 )

2005-10-05 Thread Gaetano Mendola
Hi all,
take a look at this simple function and view:

CREATE OR REPLACE FUNCTION sp_connected_test ( INTEGER )
RETURNS BOOLEAN AS'
DECLARE
   a_id_user ALIAS FOR $1;
BEGIN
   PERFORM *
   FROM v_current_connection
   WHERE id_user = a_id_user;

   IF NOT FOUND THEN
  RETURN FALSE;
   END IF;

   RETURN TRUE;

END;
' LANGUAGE 'plpgsql';

CREATE VIEW v_current_connection_test
AS SELECT ul.id_user, cc.connected
   FROM current_connection cc,
user_login ul
   WHERE cc.id_user = ul.id_user AND
 connected = TRUE;


SELECT * FROM v_current_connection_test WHERE sp_connected_test(id_user) = 
FALSE;


this line shall produce no row, but randomly does.

If I put a RAISE NOTICE before RETURN NULL with the id_user I notice that
the function is called on records present on user_login but discarged because
the join with current_connectin have connected = FALSE!

I can work_around the problem rewriting the view:

CREATE VIEW v_current_connection_test
AS SELECT cc.id_user, cc.connected
   FROM current_connection cc,
user_login ul
   WHERE cc.id_user = ul.id_user AND
 connected = TRUE;


Regards
Gaetano Mendola











---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Interesting optimizer's supposition in 8.1

2005-10-05 Thread Tom Lane
Teodor Sigaev [EMAIL PROTECTED] writes:
 Why planner suppose that t 'table' will return 1000 rows?

Because set_function_size_estimates() is only a stub :-(

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Ron Peacetree
I've now gotten verification from multiple working DBA's that DB2, Oracle, and
SQL Server can achieve ~250MBps ASTR (with as much as ~500MBps ASTR in
setups akin to Oracle RAC) when attached to a decent (not outrageous, but
decent) HD subsystem...

I've not yet had any RW DBA verify Jeff Baker's supposition that ~1GBps ASTR is
attainable.  Cache based bursts that high, yes.  ASTR, no.

The DBA's in question run RW installations that include Solaris, M$, and Linux 
OS's
for companies that just about everyone on these lists are likely to recognize.

Also, the implication of these pg IO limits is that money spent on even 
moderately
priced 300MBps SATA II based RAID HW is wasted $'s.

In total, this situation is a recipe for driving potential pg users to other 
DBMS. 
  
25MBps in and 15MBps out is =BAD=.

Have we instrumented the code in enough detail that we can tell _exactly_ where
the performance drainage is?

We have to fix this.
Ron  


-Original Message-
From: Luke Lonergan [EMAIL PROTECTED]
Sent: Oct 5, 2005 11:24 AM
To: Michael Stone [EMAIL PROTECTED], Martijn van Oosterhout 
kleptog@svana.org
Cc: pgsql-hackers@postgresql.org, pgsql-performance@postgresql.org
Subject: Re: [HACKERS] [PERFORM] A Better External Sort?

Nope - it would be disk wait.

COPY is CPU bound on I/O subsystems faster that 50 MB/s on COPY (in) and about 
15 MB/s (out).

- Luke

 -Original Message-
From:   Michael Stone [mailto:[EMAIL PROTECTED]
Sent:   Wed Oct 05 09:58:41 2005
To: Martijn van Oosterhout
Cc: pgsql-hackers@postgresql.org; pgsql-performance@postgresql.org
Subject:Re: [HACKERS] [PERFORM] A Better External Sort?

On Sat, Oct 01, 2005 at 06:19:41PM +0200, Martijn van Oosterhout wrote:
COPY TO /dev/null WITH binary
13MB/s55% user 45% system  (ergo, CPU bound)
[snip]
the most expensive. But it does point out that the whole process is
probably CPU bound more than anything else.

Note that 45% of that cpu usage is system--which is where IO overhead
would end up being counted. Until you profile where you system time is
going it's premature to say it isn't an IO problem.

Mike Stone


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] prefix btree implementation

2005-10-05 Thread Qingqing Zhou

Bricklen Anderson [EMAIL PROTECTED] wrote

 Oracle implements something similar called index compression, but I 
 believe it
 is only for common column values. I haven't checked in versions9r1 so 
 maybe
 there are other options implemented by now.

 Jonathan Lewis describes some pros and cons here:
 http://www.jlcomp.demon.co.uk/faq/compress_ind.html


Oracle 9 uses the grammar like this:

CREATE INDEX ... [ COMPRESS number_of_first_columns ]

So it gives the flexibility of choosing optimal number of coulumns to the 
user. The script mentioned in the article guesses the optimal number by 
estimating the size of each choice. But I am thinking we can do it better: 
(1) we don't require that the compressed number of columns on each page are 
the same; (2) when we build up index bottom-up, we can determine this number 
for each page automatically by maximizing the number of items within a page.

Regards,
Qingqing 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Joshua D. Drake

 We have to fix this.
 Ron  
 


The source is freely available for your perusal. Please feel free to
point us in specific directions in the code where you may see some
benefit. I am positive all of us that can, would put resources into
fixing the issue had we a specific direction to attack.

Sincerely,

Joshua D. Drake


-- 
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Ron Peacetree
First I wanted to verify that pg's IO rates were inferior to The Competition.
Now there's at least an indication that someone else has solved similar
problems.  Existence proofs make some things easier ;-)

Is there any detailed programmer level architectual doc set for pg?  I know
the best doc is the code, but the code in isolation is often the Slow Path to
understanding with systems as complex as a DBMS IO layer.

Ron
 

-Original Message-
From: Joshua D. Drake [EMAIL PROTECTED]
Sent: Oct 5, 2005 1:18 PM
Subject: Re: [HACKERS] [PERFORM] A Better External Sort?


The source is freely available for your perusal. Please feel free to
point us in specific directions in the code where you may see some
benefit. I am positive all of us that can, would put resources into
fixing the issue had we a specific direction to attack.

Sincerely,

Joshua D. Drake

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] wrong optimization ( postgres 8.0.3 )

2005-10-05 Thread Tom Lane
Gaetano Mendola [EMAIL PROTECTED] writes:
 What I'm experiencing is a problem ( I upgraded today from
 7.4.x to 8.0.3 ) that I explain here:

 The following function just return how many records there
 are inside the view v_current_connection

 CREATE OR REPLACE FUNCTION sp_count ( )
 RETURNS INTEGER AS'
 DECLARE
c INTEGER;
 BEGIN
SELECT count(*) INTO c FROM v_current_connection;
RETURN c;
 END;
 ' LANGUAGE 'plpgsql';

 I have the following select

 # select count(*), sp_count() from v_current_connection;
  count | sp_count
 - ---+--
977 |  978

 as you can see the two count are returning different record
 numbers ( in meant time other transactions are updating tables
 behind the view v_current_connection ).

This isn't surprising at all, if other transactions are actively
changing the table.  See the release notes for 8.0:

: Observe the following incompatibilities: 
: 
:  In READ COMMITTED serialization mode, volatile functions now see
:  the results of concurrent transactions committed up to the
:  beginning of each statement within the function, rather than up to
:  the beginning of the interactive command that called the function.
: 
:  Functions declared STABLE or IMMUTABLE always use the snapshot of
:  the calling query, and therefore do not see the effects of actions
:  taken after the calling query starts, whether in their own
:  transaction or other transactions.  Such a function must be
:  read-only, too, meaning that it cannot use any SQL commands other
:  than SELECT.

If you want this function to see the same snapshot as the calling query
sees, declare it STABLE.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Michael Stone

On Wed, Oct 05, 2005 at 11:24:07AM -0400, Luke Lonergan wrote:

Nope - it would be disk wait.


I said I/O overhead; i.e., it could be the overhead of calling the
kernel for I/O's. E.g., the following process is having I/O problems:

time dd if=/dev/sdc of=/dev/null bs=1 count=1000   
1000+0 records in  
1000+0 records out 
1000 bytes transferred in 8.887845 seconds (1125132 bytes/sec) 
  
real0m8.889s   
user0m0.877s   
sys 0m8.010s   


it's not in disk wait state (in fact the whole read was cached) but it's
only getting 1MB/s. 


Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Fwd: 8.1beta2 vacuum analyze hanging on idle database

2005-10-05 Thread Kevin Grittner
I see that my initial post never made it through to the list.  I assume
this was some technical failure, so I'm adding it back for this reply.
 
It doesn't appear that we did stop postmaster between incidents.
We have now done so.
 
The software we are running is a build from the beta2 release, with
no special options specified at ./configure time.  Would you expect
such a build to include the debug info you wanted?  We will include
the --enable-debug in our next build, but I wondered because I was
showing our DBA manager the diagnostic steps, and ran gdb bt
against an idle connection, and got:
 
(gdb) bt
#0  0x40197b46 in recv () from /lib/i686/libc.so.6
#1  0x0813485f in secure_read ()
#2  0x08138f7b in pq_recvbuf ()
#3  0x081393a9 in pq_getbyte ()
#4  0x08195565 in PostgresMain ()
#5  0x081716c5 in ServerLoop ()
#6  0x0817232e in PostmasterMain ()
#7  0x0813aad8 in main ()
 
Which seemed to show reasonable information, to my untrained eye.
That got me wondering whether the (corrupt stack?) note on the
previous backtrace might be something real.  Both were run against
processes running the same copy of the backend software.
 
-Kevin
 
 
 Tom Lane [EMAIL PROTECTED] 10/04/05 4:22 PM 
Kevin Grittner [EMAIL PROTECTED] writes:
 I can't hold the database in the problem state much longer -- if there
 are any other diagnostic steps you'd like me to take before we clear
 the problem, please let me know very soon.

Not at the moment ...

 INFO:  vacuuming pg_catalog.pg_constraint
 INFO:  index pg_constraint_conname_nsp_index now contains 35 row =
 versions in 2 pages
 DETAIL:  0 index pages have been deleted, 0 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.
 INFO:  index pg_constraint_conrelid_index now contains 35 row versions =
 in 2 pages
 DETAIL:  0 index pages have been deleted, 0 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.
 [Hanging here for about 2 hours so far.]

Interesting that it seems to consistently be having a problem with a
pg_constraint index.  Have you restarted the postmaster at any point
since this trouble began?  If it were something like an unreleased
buffer pin, then it could persist indefinitely until postmaster restart.


 (gdb) bt
 #0  0x40198488 in semop () from /lib/i686/libc.so.6
 #1  0x4a2c8cf8 in ?? ()
 #2  0xbfffb2e0 in ?? ()
 #3  0xbfffb308 in ?? ()
 #4  0x0816a3d4 in PGSemaphoreLock ()
 Previous frame inner to this frame (corrupt stack?)

This is fairly unhelpful :-(.  The next stack frame down would have told
us something useful, but really we need to see the whole call stack.

It may be that you need to rebuild Postgres with --enable-debug in order
to get something gdb can work with.

regards, tom lane


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Fwd: 8.1beta2 vacuum analyze hanging on idle database

2005-10-05 Thread Tom Lane
Kevin Grittner [EMAIL PROTECTED] writes:
 The software we are running is a build from the beta2 release, with
 no special options specified at ./configure time.  Would you expect
 such a build to include the debug info you wanted?

No, you need configure --enable-debug, which is not the default.
For working with a beta release, --enable-cassert isn't a bad idea
either, though it is probably not relevant to your problem.
 
 (gdb) bt
 #0  0x40197b46 in recv () from /lib/i686/libc.so.6
 #1  0x0813485f in secure_read ()
 #2  0x08138f7b in pq_recvbuf ()
 #3  0x081393a9 in pq_getbyte ()
 #4  0x08195565 in PostgresMain ()
 #5  0x081716c5 in ServerLoop ()
 #6  0x0817232e in PostmasterMain ()
 #7  0x0813aad8 in main ()
 
 Which seemed to show reasonable information, to my untrained eye.

Yeah, that looks expected for a non-debug build.  (Debug build would
show call parameters too, which is why it would be more helpful
even apart from the (corrupt stack?) problem.)

 That got me wondering whether the (corrupt stack?) note on the
 previous backtrace might be something real.

More likely, it's specific to particular places in the code that got
optimized in a way that gdb couldn't figure out.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Jeffrey W. Baker
On Wed, 2005-10-05 at 12:14 -0400, Ron Peacetree wrote:
 I've now gotten verification from multiple working DBA's that DB2, Oracle, and
 SQL Server can achieve ~250MBps ASTR (with as much as ~500MBps ASTR in
 setups akin to Oracle RAC) when attached to a decent (not outrageous, but
 decent) HD subsystem...
 
 I've not yet had any RW DBA verify Jeff Baker's supposition that ~1GBps ASTR 
 is
 attainable.  Cache based bursts that high, yes.  ASTR, no.

I find your tone annoying.  That you do not have access to this level of
hardware proves nothing, other than pointing out that your repeated
emails on this list are based on supposition.

If you want 1GB/sec STR you need:

1) 1 or more Itanium CPUs
2) 24 or more disks
3) 2 or more SATA controllers
4) Linux

Have fun.

-jwb

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fwd: 8.1beta2 vacuum analyze hanging on idle database

2005-10-05 Thread Alvaro Herrera
On Wed, Oct 05, 2005 at 02:27:32PM -0400, Tom Lane wrote:
 Kevin Grittner [EMAIL PROTECTED] writes:
  The software we are running is a build from the beta2 release, with
  no special options specified at ./configure time.  Would you expect
  such a build to include the debug info you wanted?
 
 No, you need configure --enable-debug, which is not the default.
 For working with a beta release, --enable-cassert isn't a bad idea
 either, though it is probably not relevant to your problem.

Also, note that --enable-cassert will reduce performance somewhat, and
may make the bug go away.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
En el principio del tiempo era el desencanto.  Y era la desolación.  Y era
grande el escándalo, y el destello de monitores y el crujir de teclas.
(Sean los Pájaros Pulentios, Daniel Correa)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Fwd: 8.1beta2 vacuum analyze hanging on idle database

2005-10-05 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 On Wed, Oct 05, 2005 at 02:27:32PM -0400, Tom Lane wrote:
 For working with a beta release, --enable-cassert isn't a bad idea
 either, though it is probably not relevant to your problem.

 Also, note that --enable-cassert will reduce performance somewhat, and
 may make the bug go away.

True --- but there's also a chance it could expose the bug immediately.
It'd be worth trying both ways.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [Slony1-general] Re: [HACKERS] Slony RPM issue

2005-10-05 Thread elein


On Wed, Oct 05, 2005 at 05:32:40PM +0300, Devrim GUNDUZ wrote:
 
 Hi,
 
 On Wed, 5 Oct 2005, Philip Yarra wrote:
 
 Hi Devrim, I ran into another RPM issue, this time with Slony.
 
 :-)
 
 I grabbed the RPM from
 http://developer.postgresql.org/~devrim/slony/1.1.0/rpms/PG8.0.3/
 
 Trying to run slon_start, I got errors such as:
 $ slon_start --config /etc/slon_tools.conf 2
 Invoke slon for node 2 - @@@/slon -s 1000 -d2 -g 80 replication 'host=dev2
 dbname=booktown user=postgres port=5432' 21
 /var/log/slony/slony1/node2/booktown-2005-10-05_16:23:53.log 
 sh: line 1: @@@/slon: No such file or directory
 Slon failed to start for cluster replication, node node2
 
 This turned out to be because /usr/lib/pgsql/slon-tools.pm contained 
 several
 un-expanded @@PGBINDIR@@ prefixes - once I replaced them with /usr/bin it 
 all
 ran just fine.

This is also a problem for 64bit machines which like to install
things in lib64.  The @@PGBINDIR@@ is also unset in these cases.

Generally a short sed (or perl if you like) script will fix
these up.  But it is really pretty obscure trail for people
to find the exact problem.

--elein

 
 Hrrm.
 
 It seems that I've installed wrong file in the spec :( However slon-tools 
 file is not updated by the Makefile, so I'll try to fix that, too.
 
 Thanks for the report. It will fixed in CVS and all the RPM sets later 
 today. Always feel free to send me a patch if you want, I can apply your 
 patch, too.
 
 BTW, could you please report the possible Slony problems to 
 slony1-general@gborg.postgresql.org later?
 
 Regards,
 --
 Devrim GUNDUZ
 Kivi Bili?im Teknolojileri - http://www.kivi.com.tr
 devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   
   http://www.gunduz.org

 ___
 Slony1-general mailing list
 Slony1-general@gborg.postgresql.org
 http://gborg.postgresql.org/mailman/listinfo/slony1-general


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [DOCS] Updated documentation for new sequence binding

2005-10-05 Thread Jim C. Nasby
On Sun, Oct 02, 2005 at 10:54:10PM -0400, Bruce Momjian wrote:
 pgman wrote:
  I have marged Tom's description of the new sequence binding with text I
  was working on.  I modified it to follow the existing we used to do X,
  now we do Y pattern in the surrounding entries:
  
  http://candle.pha.pa.us/tmp/pgsql/release.html#RELEASE-8-1
 
 Sorry, this is a better URL:
 
   
 http://candle.pha.pa.us/main/writings/pgsql/sgml/release.html#RELEASE-8-1

Out of curiosity, how is this file maintained as development is done?
The reason I'm asking is because it would be nice to have links to more
information, especially for the 'one-liner' items in 1.3.1 for example,
and it seems like that would be easier to do along-the-way rather than
waiting for the end. Even a link to a mailing list discussion would be
better than nothing...

That said, what's

 Improve the optimizer, including auto-resizing of hash joins (Tom)

mean?

On full_page_writes, are we certain that all battery-backed disk caches
ensure that partial-page writes can't happen?

Maybe pg_column_size should just be included in the item for all the
other size functions brought into the backend? If not, ISTM they should
at least be one after the other in the list...

Finally, weren't more changes made it contrib than what's listed?
Nothing's said about pg_autovacuum for example.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] prefix btree implementation

2005-10-05 Thread Qingqing Zhou

Alvaro Herrera [EMAIL PROTECTED] wrote
 On Wed, Oct 05, 2005 at 12:50:41AM -0400, Tom Lane wrote:
 Qingqing Zhou [EMAIL PROTECTED] writes:
  1/ What types of prefix compression shall we support?

 Given the requirement of datatype independence, this idea seems a
 complete nonstarter to me...

 How about having each type optionally provide the required routines?
 Thus we could provide them at least for the most common datatypes, and
 the system would continue working as currently for the rest (including
 user-defined types).  Cross-column prefixes would be hard to handle I
 guess, as well as TOASTed data.


Yes, column-wise should not be difficult since it does require no knowledge 
of the data types.

We can treat cross-column or incomplete-column share in two ways. One way 
(binary-comparison) is that we just compare the items binaryly, i.e., 
without knowing what in fact is stored. The other way (datatype-comparison) 
is we compare the items with some knowlege of the data types. For example, 
suppose the index is defined on a varchar column and the examplar data look 
like this:

{3|'aaa'}{4|'aaab'}{5|'aaabc'}

The binary-comparison way can't share prefix 'aaa' at all, because it will 
see 3, 4 and 5 are totally different. The datatype-comparison way can share 
the prefix 'aaa', but it has to know that each varchar column is associated 
with a length header. When it compares, it ignores the header. The first way 
is easy to implement, and works for some data types like integers, but no 
acceptable I guess, since it even does not support varchar column prefix 
sharing.

We can find a way to handle the above case, but it is better to find a 
general way to handle any data types(include UDT). Each type optionally 
provide the required routines could be a way, more details?

 One problem I do see is what happens if I need to insert a new tuple in
 the page that doesn't share the prefix.  It obviously would have to be
 the leftmost or rightmost item on the page, but it's possible.


We do the prefix sharing when we build up index only, never on the fly.

Regards,
Qingqing 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] [EMAIL PROTECTED]: [BUGS] Race condition in dropdb;createdb]

2005-10-05 Thread Jim C. Nasby
Never got a reply on -bugs... do people think this is an issue?

- Forwarded message from Jim C. Nasby [EMAIL PROTECTED] -
[EMAIL PROTECTED]:32]~:1%createdb test;
CREATE DATABASE
[EMAIL PROTECTED]:32]~:2%dropdb test; createdb test;
DROP DATABASE
createdb: database creation failed: ERROR:  source database template1
is being accessed by other users
[EMAIL PROTECTED]:32]~:3%

I realize that the changes made in 8.1 WRT using the postgres database
fix this particular case, but is there a bigger issue with there being a
race condition here? Presumably dropdb returns before the connection is
actually closed from the database's standpoint...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Andrej Ricnik-Bay
On 10/6/05, Michael Stone [EMAIL PROTECTED] wrote:
 On Wed, Oct 05, 2005 at 11:24:07AM -0400, Luke Lonergan wrote:
 Nope - it would be disk wait.

 I said I/O overhead; i.e., it could be the overhead of calling the
 kernel for I/O's. E.g., the following process is having I/O problems:

 time dd if=/dev/sdc of=/dev/null bs=1 count=1000
 1000+0 records in
 1000+0 records out
 1000 bytes transferred in 8.887845 seconds (1125132 bytes/sec)

 real0m8.889s
 user0m0.877s
 sys 0m8.010s

 it's not in disk wait state (in fact the whole read was cached) but it's
 only getting 1MB/s.

 Mike Stone

 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

I think you only proved that dd isn't the smartest tool out there... or
that using it with a blocksize of 1 byte doesn't make too much sense.


[EMAIL PROTECTED]:~]$ time dd if=/dev/sr0 of=/dev/null bs=2048 count=4883
4883+0 records in
4883+0 records out

real0m6.824s
user0m0.010s
sys 0m0.060s
[EMAIL PROTECTED]:~]$ time dd if=/dev/sr0 of=/dev/null bs=1 count=1000
1000+0 records in
1000+0 records out

real0m18.523s
user0m7.410s
sys 0m10.310s
[EMAIL PROTECTED]:~]$ time dd if=/dev/sr0 of=/dev/null bs=8192 count=1220
1220+0 records in
1220+0 records out

real0m6.796s
user0m0.000s
sys 0m0.070s

That's with caching, and all.  Or did I miss the point of your post
completely?  Interestingly, the CPU usage with the bs=1 goes up
to 97%, it stays at a mellow 3% with the 8192 and 2048.


Cheers,
Andrej

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [EMAIL PROTECTED]: [BUGS] Race condition in dropdb;createdb]

2005-10-05 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 Never got a reply on -bugs... do people think this is an issue?

Not really.

The only way to fix it would be to make PQfinish do a synchronous
close, which seems like more of a performance loss than it's worth.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Jonah H. Harris
Ron,

This thread is getting on my nerves.  Your tone in some of the other
posts (as-well-as this one) is getting very annoying.  Yes,
PostgreSQL's storage manager (like all other open source databases),
lacks many of the characteristics and enhancements of the commercial
databases.  Unlike Oracle, Microsoft, etc., the PostgreSQL Global
Development Group doesn't have the tens of millions of dollars
required to pay hundreds of developers around the world for
round-the-clock development and RD.  Making sure that every little
tweak, on every system, is taken advantage of is expensive (in terms
of time) for an open source project where little ROI is gained. 
Before you make a statement like, I wanted to verify that pg's IO
rates were inferior to The Competition, think about how you'd write
your own RDBMS from scratch (in reality, not in theory).

As for your question regarding developer docs for the storage manager
and related components, read the READMEs and the code... just like
everyone else.

Rather than posting more assumptions and theory, please read through
the code and come back with actual suggestions.

-Jonah

2005/10/5, Ron Peacetree [EMAIL PROTECTED]:
 First I wanted to verify that pg's IO rates were inferior to The Competition.
 Now there's at least an indication that someone else has solved similar
 problems.  Existence proofs make some things easier ;-)

 Is there any detailed programmer level architectual doc set for pg?  I know
 the best doc is the code, but the code in isolation is often the Slow Path 
 to
 understanding with systems as complex as a DBMS IO layer.

 Ron


 -Original Message-
 From: Joshua D. Drake [EMAIL PROTECTED]
 Sent: Oct 5, 2005 1:18 PM
 Subject: Re: [HACKERS] [PERFORM] A Better External Sort?


 The source is freely available for your perusal. Please feel free to
 point us in specific directions in the code where you may see some
 benefit. I am positive all of us that can, would put resources into
 fixing the issue had we a specific direction to attack.

 Sincerely,

 Joshua D. Drake

 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match



--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [DOCS] Updated documentation for new sequence binding

2005-10-05 Thread Bruce Momjian
Jim C. Nasby wrote:
 On Sun, Oct 02, 2005 at 10:54:10PM -0400, Bruce Momjian wrote:
  pgman wrote:
   I have marged Tom's description of the new sequence binding with text I
   was working on.  I modified it to follow the existing we used to do X,
   now we do Y pattern in the surrounding entries:
   
 http://candle.pha.pa.us/tmp/pgsql/release.html#RELEASE-8-1
  
  Sorry, this is a better URL:
  
  
  http://candle.pha.pa.us/main/writings/pgsql/sgml/release.html#RELEASE-8-1
 
 Out of curiosity, how is this file maintained as development is done?
 The reason I'm asking is because it would be nice to have links to more
 information, especially for the 'one-liner' items in 1.3.1 for example,
 and it seems like that would be easier to do along-the-way rather than
 waiting for the end. Even a link to a mailing list discussion would be
 better than nothing...

I go through the CVS commits and make the release notes just before
beta.  During it incrementally is much more work.

 That said, what's
 
  Improve the optimizer, including auto-resizing of hash joins (Tom)
 
 mean?

We would sometimes fail in a query where the allocated memory was larger
than our hash could hold.  This fixed that.

 
 On full_page_writes, are we certain that all battery-backed disk caches
 ensure that partial-page writes can't happen?

Well, I would think so, but I have no documentation to state that.  The
battery-backed memory is supposed to allow for power failure and keep
writes in the cache until they hit the disk.

 Maybe pg_column_siz should just be included in the item for all the
 other size functions brought into the backend? If not, ISTM they should
 at least be one after the other in the list...

Not really.  It is different in that is does not aggregate values, but
just the storage of a column.

 Finally, weren't more changes made it contrib than what's listed?
 Nothing's said about pg_autovacuum for example.

Moved to the main server.  That is mentioned.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] wrong optimization ( postgres 8.0.3 )

2005-10-05 Thread Gaetano Mendola
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom Lane wrote:
 Gaetano Mendola [EMAIL PROTECTED] writes:
 What I'm experiencing is a problem ( I upgraded today from
 7.4.x to 8.0.3 ) that I explain here:
 
 The following function just return how many records there
 are inside the view v_current_connection
 
 CREATE OR REPLACE FUNCTION sp_count ( )
 RETURNS INTEGER AS'
 DECLARE
c INTEGER;
 BEGIN
SELECT count(*) INTO c FROM v_current_connection;
RETURN c;
 END;
 ' LANGUAGE 'plpgsql';
 
 I have the following select
 
 # select count(*), sp_count() from v_current_connection;
  count | sp_count
 - ---+--
977 |  978
 
 as you can see the two count are returning different record
 numbers ( in meant time other transactions are updating tables
 behind the view v_current_connection ).
 
 This isn't surprising at all, if other transactions are actively
 changing the table.  See the release notes for 8.0:
 
 : Observe the following incompatibilities: 
 : 
 :  In READ COMMITTED serialization mode, volatile functions now see
 :  the results of concurrent transactions committed up to the
 :  beginning of each statement within the function, rather than up to
 :  the beginning of the interactive command that called the function.
 : 
 :  Functions declared STABLE or IMMUTABLE always use the snapshot of
 :  the calling query, and therefore do not see the effects of actions
 :  taken after the calling query starts, whether in their own
 :  transaction or other transactions.  Such a function must be
 :  read-only, too, meaning that it cannot use any SQL commands other
 :  than SELECT.
 
 If you want this function to see the same snapshot as the calling query
 sees, declare it STABLE.


I think I understood :-(

Just to be clear:

select work_on_connected_user(id_user) from v_connected_user;

if that function is not stable than it can work on an id_user that is not 
anymore
on view v_connected_user. Is this right ?


Regards
Gaetano Mendola











-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFDRDPh7UpzwH2SGd4RAnPVAJ9PdcVoUoOh7U4poR0Hd9uT4l/QgACg9nXg
sebdHozcBV7t7JZslluGzB8=
=rFgE
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Resultset duplicates (was Re: [HACKERS] prefix btree implementation)

2005-10-05 Thread Richard Huxton

Qingqing Zhou wrote:

Oracle 9 uses the grammar like this:

CREATE INDEX ... [ COMPRESS number_of_first_columns ]

So it gives the flexibility of choosing optimal number of coulumns to the 
user. The script mentioned in the article guesses the optimal number by 
estimating the size of each choice. But I am thinking we can do it better: 
(1) we don't require that the compressed number of columns on each page are 
the same; (2) when we build up index bottom-up, we can determine this number 
for each page automatically by maximizing the number of items within a page.


Are there any gains in eliminating duplicate values in result-sets? I'd 
guess that many/most large result-sets are sorted which should make it 
possible to get away with a same as last row marker when the whole set 
is returned to a client.


Of course, this is where someone turns around and tells me we do this 
already :-)


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Ron Peacetree
I'm putting in as much time as I can afford thinking about pg related
performance issues.  I'm doing it because of a sincere desire to help
understand and solve them, not to annoy people.

If I didn't believe in pg, I would't be posting thoughts about how to
make it better.  

It's probably worth some review (suggestions marked with a +:

+I came to the table with a possibly better way to deal with external
sorts (that now has branched into 2 efforts: short term improvements
to the existing code, and the original from-the-ground-up idea).  That
suggestion was based on a great deal of prior thought and research,
despite what some others might think.

Then we were told that our IO limit was lower than I thought.

+I suggested that as a Quick Fix we try making sure we do IO
transfers in large enough chunks based in the average access time
of the physical device in question so as to achieve the device's
ASTR (ie at least 600KB per access for a 50MBps ASTR device with
a 12ms average access time.) whenever circumstances allowed us.
As far as I know, this experiment hasn't been tried yet.

I asked some questions about physical layout and format translation
overhead being possibly suboptimal that seemed to be agreed to, but
specifics as to where we are taking the hit don't seem to have been
made explicit yet.

+I made the from left field suggestion that perhaps a pg native fs
format would be worth consideration.  This is a major project, so
the suggestion was to at least some extent tongue-in-cheek.

+I then made some suggestions about better code instrumentation
so that we can more accurately characterize were the bottlenecks are. 

We were also told that evidently we are CPU bound far before one
would naively expect to be based on the performance specifications
of the components involved.

Double checking among the pg developer community led to some
differing opinions as to what the actual figures were and under what
circumstances they were achieved.  Further discussion seems to have
converged on both accurate values and a better understanding as to
the HW and SW  needed; _and_ we've gotten some RW confirmation
as to what current reasonable expectations are within this problem
domain from outside the pg community.

+Others have made some good suggestions in this thread as well.
Since I seem to need to defend my tone here, I'm not detailing them
here.  That should not be construed as a lack of appreciation of them.

Now I've asked for the quickest path to detailed understanding of the
pg IO subsystem.  The goal being to get more up to speed on its
coding details.  Certainly not to annoy you or anyone else.

At least from my perspective, this for the most part seems to have
been an useful and reasonable engineering discussion that has
exposed a number of important things.
  
Regards,
Ron

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-05 Thread Luke Lonergan
Michael,

On 10/5/05 8:33 AM, Michael Stone [EMAIL PROTECTED] wrote:

 real0m8.889s 
 user0m0.877s 
 sys 0m8.010s 
 
 it's not in disk wait state (in fact the whole read was cached) but it's
 only getting 1MB/s.

You've proven my point completely.  This process is bottlenecked in the CPU.
The only way to improve it would be to optimize the system (libc) functions
like fread where it is spending most of it's time.

In COPY, we found lots of libc functions like strlen() being called
ridiculous numbers of times, in one case it was called on every
timestamp/date attribute to get the length of TZ, which is constant.  That
one function call was in the system category, and was responsible for
several percent of the time.

By the way, system routines like fgetc/getc/strlen/atoi etc, don't appear in
gprof profiles of dynamic linked objects, nor by default in oprofile
results.

If the bottleneck is in I/O, you will see the time spent in disk wait, not
in system.

- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Slony RPM issue

2005-10-05 Thread Philip Yarra
Hi Devrim,

On Thu, 6 Oct 2005 12:32 am, Devrim GUNDUZ wrote:
 Thanks for the report. It will fixed in CVS and all the RPM sets later
 today. Always feel free to send me a patch if you want, I can apply your
 patch, too.

OK, you got my previous email about why pgsql-libs was dependent on 
compat-libs RPM?

I don't even know where to start with fixing RPM issues, but I guess I can 
learn if you would like help?

 BTW, could you please report the possible Slony problems to
 slony1-general@gborg.postgresql.org later?

Sure will. Ermmm, there isn't something you want to tell me is there? :-) I'd 
rather hear about gotchas now rather than finding them myself at the eleventh 
hour.

Regards, Philip.

-- 

Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it. - Brian W. Kernighan

-
Utiba Pty Ltd 
This message has been scanned for viruses and
dangerous content by Utiba mail server and is 
believed to be clean.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Vacuum and Transactions

2005-10-05 Thread Chris Browne
[EMAIL PROTECTED] (Hannu Krosing) writes:
 It also seems that Slony can be modified to not use LISTEN/NOTIFY in
 high load situations (akin to high performance network cards, which
 switch from interrupt driven mode to polling mode if number of packets
 per second reaches certain thresolds).

Yeah, I want to do some more testing of that; it should be easy to
improve the abuse of pg_listener a whole lot.

 Unfortunately Slony and Listen/Notify is not the only place where
 high- update rate tables start to suffer from vacuums inability to
 clean out dead tuples when working in parallel with other slower
 vacuums. In real life there are other database tasks which also need
 some tables to stay small, while others must be huge in order to
 work effectively. Putting small and big tables in different
 databases and using dblink-like functionality when accessing them is
 one solution for such cases, but it is rather ugly :(

That eliminates the ability to utilize transactions on things that
ought to be updated in a single transaction...
-- 
output = (cbbrowne @ ntlug.org)
http://cbbrowne.com/info/lsf.html
MS-Windows: Proof that P.T. Barnum was correct. 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] prefix btree implementation

2005-10-05 Thread Junji TERAMOTO
Hello all,

I also was examining a similar compression method just.

Qingqing Zhou wrote:
 We can find a way to handle the above case, but it is better to find a 
 general way to handle any data types(include UDT). Each type optionally 
 provide the required routines could be a way, more details?

How about the use of difference information with High key?
Because High key information exists also in the route page, I think
that it seems to be able to use it well. (e.g. new tuple insertion)
# There is a problem that in the rightmost page, High key is not...

My consideration was just started, whether it goes well really has not
been understood yet.

---
Junji Teramoto

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Bitmap scans vs. the statistics views

2005-10-05 Thread Tom Lane
Quite some time ago I complained about the fact that bitmap index scans
weren't being counted sanely by the statistics mechanism:
http://archives.postgresql.org/pgsql-hackers/2005-04/msg00675.php
That discussion trailed off without deciding how to fix it, but we
really can't let this go without fixing it in 8.1.

I studied the code some more and realized that we had been operating
under some fundamental misconceptions.  The distinction made in the
existing stats code between tuples fetched and tuples returned has
nothing whatever to do with live vs. dead tuples --- all these counts
are made only after determining that a tuple is visible.  The way it
really works in 8.0 is:

table tuples_returned: tuples returned by heap_getnext, ie,
live tuples found by seqscans
table tuples_fetched: tuples returned by heap_fetch under
conditions other than being invoked by an indexscan
(this covers various random cases like ANALYZE and
TID scans)
index tuples_fetched: tuples returned by heap_fetch when
invoked by an indexscan on this index
index tuples_returned: actually, exactly the same as
tuples_fetched.

This possibly explains why the original design of the pg_stat_all_tables
view exposed only two of the seemingly four interesting counts.

I have just committed changes that redefine the counts like this:

table tuples_returned: same as before, ie,
live tuples found by seqscans
table tuples_fetched: tuples returned by heap_fetch when
invoked by a bitmap scan (the random other cases
no longer get counted at all)
index tuples_fetched: same as before, ie, live tuples
fetched by simple indexscans using this index
index tuples_returned: number of index entries returned
from the index AM, counting both simple and bitmap
scans.

The pg_stat_all_tables view is modified to add the table's
tuples_fetched count to the sum of the per-index tuples_fetched counts,
so that idx_tup_fetch counts both simple and bitmap index scans.
It's possible to break these out by looking at the low-level statistics
functions, however.

With the new definitions you can get some weak information about the
numbers of dead tuples fetched by indexscans, which was not possible
at all before.  (It's weak because it's not easy to distinguish
differences due to dead tuples from differences due to bitmap scanning.)
In the earlier discussion, Josh commented that getting stats about dead
tuples probably belongs somewhere else anyway, and I'm inclined to agree
with that; so I don't feel too bad about not having provided more
complete information.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] fixing LISTEN/NOTIFY

2005-10-05 Thread Neil Conway
Applications that frequently use LISTEN/NOTIFY can suffer from
performance problems because of the MVCC bloat created by frequent
insertions into pg_listener. A solution to this has been suggested in
the past: rewrite LISTEN/NOTIFY to use shared memory rather than system
catalogs.

The problem is that there is a static amount of shared memory and a
potentially unbounded number of notifications, so we can run out of
memory. There are two ways to solve this: we can do as sinval does and
clear the shared memory queue, then effectively issue a NOTIFY ALL that
awakens all listeners. I don't like this behaviour: it seems ugly to
expose an implementation detail (static sizing of shared memory) to
applications. While a lot of applications are only using LISTEN/NOTIFY
for cache invalidation (and so spurious notifications are just a
performance hit), this behaviour still seems unfortunate to me. Using
NOTIFY ALL also makes NOTIFY 'msg' far less useful, which is a feature
several users have asked for in the past.

I think it would be better to either fail the NOTIFY when there is not
enough shared memory to add a new notification to the queue, or have the
NOTIFY block until shared memory does become available (applications
could of course implement the latter on top of the former by using
savepoints and a loop, either on the client-side or in PL/PgSQL). I
guess we could add an option to NOTIFY to specify how to handle
failures.

A related question is when to add the notification to the shared memory
queue. We don't want the notification to fire until the NOTIFY's
transaction commits, so one alternative would be to delay appending to
the queue until transaction-commit time. However, that would mean we
wouldn't notice NOTIFY failure until the end of the transaction, or else
that we would block waiting for free space during the transaction-commit
process. I think it would be better to add an entry to shared memory
during the NOTIFY itself, and stamp that entry with the NOTIFY's
toplevel XID. Other backends can read that the notification immediately
(and once all the backends have seen it, the notification can be removed
from the queue). Each backend can use the XID to determine when to
fire the notification (and if the notifying backend rolls back, they
can just discard the notification). This scheme is more expensive when
the notifying transaction rolls back, but I don't think that is the
common case.

Comments? (I'm still thinking about how to organize the shared memory
queue, and whether any of the sinval stuff can be reused...)

-Neil



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] fixing LISTEN/NOTIFY

2005-10-05 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 [ various ideas about reimplementing LISTEN/NOTIFY ]

I really dislike the idea of pushing notifies out to the shared queue
before commit.  That essentially turns forever do notify foo into
a global DOS tool: you can drive everybody else's backend into swap
hell along with your own.

The idea of blocking during commit until shmem becomes available might
work.  There's some issues here about transaction atomicity, though:
how do you guarantee that all or none of your notifies get sent?
(Actually, supposing that the notifies ought to be sent post-commit,
all is the only acceptable answer.  So maybe you just never give up.)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] fixing LISTEN/NOTIFY

2005-10-05 Thread Neil Conway
On Thu, 2005-06-10 at 01:14 -0400, Tom Lane wrote:
 The idea of blocking during commit until shmem becomes available might
 work.  There's some issues here about transaction atomicity, though:
 how do you guarantee that all or none of your notifies get sent?
 (Actually, supposing that the notifies ought to be sent post-commit,
 all is the only acceptable answer.  So maybe you just never give up.)

Yeah, I think that would work. We could also write to shared memory
before the commit proper, and embed an XID in the message to allow other
backends to determine when/if to fire the notification.

However, I don't really like the idea of blocking the backend for a
potentially significant amount of time in a state half-way between
committed and ready for the next query.

-Neil



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq