[HACKERS] Bug with UTF-8 character

2006-05-26 Thread Hans-Jürgen Schönig

good morning,

I got a bug request for the following unicode character in PostgreSQL 
8.1.4: 0xedaeb8


ERROR:  invalid byte sequence for encoding UTF8: 0xedaeb8

This one seemed to work properly in PostgreSQL 8.0.3.

I think the following code in postgreSQL 814 has a bug in it.

File: postgresql-8.1.4/src/backend/utils/mb/wchar.c


The entry values to the function are:

source = ed ae b8 20 20 20 20 20 20 20 20 20 20 20 20

length = 3 (length is the length of current utf-8 character)

But the code does a check where the second character should not be 
greater than 0x9F, when first character is 0xED. This is not according 
to UTF-8 standard in RFC 3629. I believe that is not a valid test.


This test fails on our string, when it shouldn’t.

I believe this is a bug, could you please confirm or let me know what I 
am doing wrong.



Many thanks,

Hans


--
Cybertec Geschwinde  Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Updatable views/with check option parsing

2006-05-26 Thread Peter Eisentraut
Am Mittwoch, 24. Mai 2006 20:42 schrieb Tom Lane:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  I have spent some time figuring out how to resolve the parsing conflicts
  in Bernd Helmle's updatable views patch.  The problem has now been
  reduced to specifically this situation:

 Could we see the proposed patches for gram.y?

Here it is.

$ make -W gram.y gram.c
bison -y -d  gram.y
conflicts: 4 shift/reduce

These are basically for instances of the same problem.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/
--- /home/pei/devel/pg82/pgsql/src/backend/parser/gram.y	2006-05-22 09:05:13.0 +0200
+++ gram.y	2006-05-26 09:41:21.0 +0200
@@ -339,7 +339,8 @@
 %type list	constraints_set_list
 %type boolean constraints_set_mode
 %type str		OptTableSpace OptConsTableSpace OptTableSpaceOwner
-
+%type listopt_check_option
+%type nodeopt_check_mode
 
 /*
  * If you make any token changes, update the keyword table in
@@ -356,7 +357,7 @@
 	BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
 	BOOLEAN_P BOTH BY
 
-	CACHE CALLED CASCADE CASE CAST CHAIN CHAR_P
+	CACHE CALLED CASCADE CASCADED CASE CAST CHAIN CHAR_P
 	CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
 	CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
 	COMMITTED CONNECTION CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB
@@ -4618,12 +4619,12 @@
 /*
  *
  *	QUERY:
- *		CREATE [ OR REPLACE ] [ TEMP ] VIEW viewname '('target-list ')' AS query
+ *		CREATE [ OR REPLACE ] [ TEMP ] VIEW viewname '('target-list ')' AS query [ WITH [ CASCADED | LOCAL ] CHECK OPTION ]
  *
  */
 
 ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list
-AS SelectStmt
+AS SelectStmt opt_check_option
 {
 	ViewStmt *n = makeNode(ViewStmt);
 	n-replace = false;
@@ -4631,10 +4632,11 @@
 	n-view-istemp = $2;
 	n-aliases = $5;
 	n-query = (Query *) $7;
+	n-options = (List *) $8;
 	$$ = (Node *) n;
 }
 		| CREATE OR REPLACE OptTemp VIEW qualified_name opt_column_list
-AS SelectStmt
+AS SelectStmt opt_check_option
 {
 	ViewStmt *n = makeNode(ViewStmt);
 	n-replace = true;
@@ -4642,10 +4644,25 @@
 	n-view-istemp = $4;
 	n-aliases = $7;
 	n-query = (Query *) $9;
+	n-options = (List *) $10;
 	$$ = (Node *) n;
 }
 		;
 
+opt_check_option:
+WITH opt_check_mode CHECK OPTION 
+		{ 
+	$$ = list_make1( $2 );
+}
+		| /* EMPTY */ { $$ = NIL; }
+;
+
+opt_check_mode:
+CASCADED { $$ = (Node *)makeString(cascaded); }
+| LOCAL { $$ = (Node *)makeString(local); }
+| /* EMPTY */ { $$ = (Node *)makeString(cascaded); }
+		;
+
 /*
  *
  *		QUERY:
@@ -8500,7 +8517,6 @@
 			| VARYING
 			| VIEW
 			| VOLATILE
-			| WITH
 			| WITHOUT
 			| WORK
 			| WRITE
@@ -8551,8 +8567,6 @@
 			| SETOF
 			| SMALLINT
 			| SUBSTRING
-			| TIME
-			| TIMESTAMP
 			| TREAT
 			| TRIM
 			| VARCHAR
@@ -8608,6 +8622,7 @@
 			| ASC
 			| ASYMMETRIC
 			| BOTH
+			| CASCADED
 			| CASE
 			| CAST
 			| CHECK
@@ -8662,6 +8677,8 @@
 			| SYMMETRIC
 			| TABLE
 			| THEN
+			| TIME
+			| TIMESTAMP
 			| TO
 			| TRAILING
 			| TRUE_P
@@ -8671,6 +8688,7 @@
 			| USING
 			| WHEN
 			| WHERE
+			| WITH
 		;
 
 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] GIN stuck in loop during PITR

2006-05-26 Thread Teodor Sigaev
Thanks a lot, applied. Can you describe test suite? It may be useful for test 
more...


GIN is young code and it needs to independently tests.

Andreas Seltenreich wrote:

I'm just experimenting a bit with GIN, and it is occasionally getting
stuck looping in findParents() during WAL replay.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] max(*)

2006-05-26 Thread Dennis Bjorklund

Shouldn't

  SELECT max(*) FROM foo;

give an error? Instead it's executed like

  SELECT max(1) FROM foo;

Just like count(*) is executed as count(1).

Something for the TODO or is it a feature?

ps. I know it's not an important case since no one sane would try to 
calculate max(*), but still.


/Dennis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] XLogArchivingActive

2006-05-26 Thread Simon Riggs
On Thu, 2006-05-25 at 17:25 +0200, Andreas Pflug wrote:
 Tom Lane wrote:
  Andreas Pflug [EMAIL PROTECTED] writes:
  
 I propose to introduce a GUC permanent_archiving or so, to select 
 whether wal archiving happens permanently or only when a backup is in 
 progress (i.e. between pg_start_backup and pg_stop_backup).
  
  
  This is silly.  Why not just turn archiving on and off?
 
 Not quite. I want online backup, but no archiving. 

I can see what you want and why you want it. It's good to have the
option of a physical online backup as opposed to the logical online
backup that pg_dump offers.

 Currently, I have to 
 edit postgresql.conf and SIGHUP to turn on archiving configuring a 
 (hopefully) writable directory, do the backup, edit postgresql.conf and 
 SIGHUP again. Not too convenient...

You're doing this for pgAdmin right?

My understanding was that we had the tools now to edit the
postgresql.conf programmatically? 

Seems like its not too convenient to change the way the server operates
to do this, as long as we solve the SIGHUP/postgresql.conf problem. I'm
also not that happy about curtailing people's options on backup either:
if people decided they wanted to have a mixture of isolated on-line
backup (as you suggest), plus active archiving at other times they would
still have the problems you suggest.

Not sure what the edit commands are offhand, but we would need the
following program:

- edit postgresql.conf
- pg_reload_conf()
- wait 30
- pg_start_backup('blah')
- backup
- pg_stop_backup()
- unedit postgresql.conf
- pg_reload_conf()

Which could then be wrapped even more simply as

- pg_start_backup_online('blah')
- backup
- pg_stop_backup_online()

Which overall seems lots easier than changing the server and adding
another parameter.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] max(*)

2006-05-26 Thread Simon Riggs
On Fri, 2006-05-26 at 10:22 +0200, Dennis Bjorklund wrote:
 Shouldn't
 
SELECT max(*) FROM foo;
 
 give an error? 

SQL:2003 would not allow this; SQL:2003 permits only COUNT(*) and no
other aggregate function. All other aggregates require a value
expression.

 Instead it's executed like
 
SELECT max(1) FROM foo;
 
 Just like count(*) is executed as count(1).
 
 Something for the TODO or is it a feature?

Doesn't seem an important or even useful extension of the standard, but
would probably require special case processing for every aggregate
function in order to implement that. Its not dangerous... so I'm not
sure we should take any action at all.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] XLogArchivingActive

2006-05-26 Thread Andreas Pflug

Tom Lane wrote:

Andreas Pflug [EMAIL PROTECTED] writes:

That's right, but my proposal would implicitely switch on archiving 
while backup is in progress, thus explicitely enabling/disabling 
archiving wouldn't be necessary.



I'm not sure you can expect that to work.  The system is not built to
guarantee instantaneous response to mode changes like that.


Um, as long as xlog writing stops immediate recycling when 
pg_start_backup is executed everything should be fine, since archived 
logs are not expected to be present until pg_stop_backup is done.




The conventional wisdom is that pg_dump files are substantially smaller
than the on-disk footprint ... and that's even without compressing them.
I think you are taking a corner case, ie bytea data, and presenting it
as something that ought to be the design center.


I certainly have an extreme cornercase, since data is highly 
compressible. I won't suggest to replace pg_dump by physical backup 
methods, but disaster recovery may take considerably longer from a dump 
than from filesystem level backup.




Something that might be worth considering is an option to allow pg_dump
to use binary COPY.  I don't think this'd work nicely for text dumps,
but seems like custom- or tar-format dumps could be made to use it.
This would probably be a win for many datatypes not only bytea, and it'd
still be far more portable than a filesystem dump.


I'd really love a copy format that works for binary and text data as 
well, optimally compressed. Initial replication to a new slony cluster 
node uses COPY, and network bandwidth may become the restricting factor. 
Line protocol compression would be desirable for that too, but that's 
another story.



Regards,
Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] XLogArchivingActive

2006-05-26 Thread Andreas Pflug

Simon Riggs wrote:

On Thu, 2006-05-25 at 17:25 +0200, Andreas Pflug wrote:





Currently, I have to 
edit postgresql.conf and SIGHUP to turn on archiving configuring a 
(hopefully) writable directory, do the backup, edit postgresql.conf and 
SIGHUP again. Not too convenient...



You're doing this for pgAdmin right?


Not yet, just trying to manage a server.



My understanding was that we had the tools now to edit the
postgresql.conf programmatically? 


Seems like its not too convenient to change the way the server operates
to do this, as long as we solve the SIGHUP/postgresql.conf problem. I'm
also not that happy about curtailing people's options on backup either:
if people decided they wanted to have a mixture of isolated on-line
backup (as you suggest), plus active archiving at other times they would
still have the problems you suggest.


Why?
My suggestion is to redefine XLogArchivingActive. Currently, it tests 
for non-null archive_command. I propose

bool XlogArchivingActive()
{
   if (XLogArchiveCommand[0] == 0)
  return false;
   return (XLogPermanentArchive // from GUC
  || OnlineBackupRunning()); // from pg_start_backup
}

The people you mention simply have XLogPermanentActive=true in 
postgresql.conf, delivering the current behaviour.




Not sure what the edit commands are offhand, but we would need the
following program:

- edit postgresql.conf
- pg_reload_conf()
- wait 30
- pg_start_backup('blah')
- backup
- pg_stop_backup()
- unedit postgresql.conf
- pg_reload_conf()

Which could then be wrapped even more simply as

- pg_start_backup_online('blah')
- backup
- pg_stop_backup_online()


Editing postgresql.conf for this is ugly. In addition, 
pg_start_backup_online would need an additional parameter, the (highly 
machine specific) archive_command string. I'd like to see that parameter 
untouched in postgresql.conf.


Regards,
Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] v814 + OSX10.4.6 builds OK, but fails launch ...

2006-05-26 Thread Martijn van Oosterhout
On Wed, May 24, 2006 at 05:45:57PM -0700, Richard wrote:
 /usr/local/pgsql/bin   ls -al postmaster
   lrwxrwx--- 1 root wheel 8 2006-05-24 07:48 postmaster - postgres

How did that happen. I was always under the impression that permission
bits on symbolic links were ignored, and chmod on my machine (linux)
won't let me change them anyway, it changes the bits on the underlying
file. Is OSX different here?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] timezones to own config file

2006-05-26 Thread Martijn van Oosterhout
On Wed, May 24, 2006 at 09:13:42PM -0400, Tom Lane wrote:
 The zic database doesn't seem to have a problem with using the same
 abbreviations to mean many different things.  We could look to it for
 information, or maybe even use its classification of timezone groups,
 but I don't think it can solve the problem for us.
 
 I think you may be thinking of yet a separate TODO item, which is to
 be able to use the zic timezone names in timestamptz input, viz
   '2006-05-24 21:11 Americas/New_York'::timestamptz
 But names like 'IST' or 'CDT' are not zic timezone names, they just
 represent specific offsets from UTC.

Well, the zic database does contain information about the
abbreviations, so we would be able to build a list of them. I think the
right solution is probably fix the above first (allow full zic
timezones in timestamps) and then setup the CST/CEST/etc as a list of
aliases users can customise...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] XLogArchivingActive

2006-05-26 Thread Tom Lane
Andreas Pflug [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I'm not sure you can expect that to work.  The system is not built to
 guarantee instantaneous response to mode changes like that.

 Um, as long as xlog writing stops immediate recycling when 
 pg_start_backup is executed everything should be fine, since archived 
 logs are not expected to be present until pg_stop_backup is done.

Wrong.  You forgot about all the *other* behaviors that change depending
on XLogArchivingActive, like whether CREATE INDEX gets archived or
just fsync'd.  I don't think it makes sense for CREATE INDEX to change
that behavior in midstream, even assuming that it noticed the flag
change instantly.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] max(*)

2006-05-26 Thread Gurjeet Singh
On 5/26/06, Simon Riggs [EMAIL PROTECTED] wrote: On Fri, 2006-05-26 at 10:22 +0200, Dennis Bjorklund wrote:
  Shouldn't  SELECT max(*) FROM foo;
   give an error?IMO, yes.  SQL:2003 would not allow this; SQL:2003 permits only COUNT(*) and no other aggregate function. All other aggregates require a value

 _expression_. This precisely being the reason.  Instead it's executed like  SELECT max(1) FROM foo;   Just like count(*) is executed as count(1).
 That's right; see the intearction pasted below.  Something for the TODO or is it a feature?We definitely cannot tout it as a feature, because it is not even a 'useful extension of the standard'
 Doesn't seem an important or even useful extension of the standard, but
 would probably require special case processing for every aggregate function in order to implement that. Its not dangerous... so I'm not sure we should take any action at all.A TODO wouldn't do any harm. If somebosy comes up with some smart solution, you can always incorporate it.
Something not supported should be stated as such through an ERROR. Except for count(), none of the following make any sense:The transcipt:test=# \d t1
Table public.t1
 Column |Type | Modifiers

+-+--- a| integer | not null

Indexes:t1_pkey PRIMARY KEY, btree (a)
test=# select * from t1;

 a---

 1 2 3

 4 5

(5 rows)test=# select count(*) from t1;
 count---
 5(1 row)
test=# select count(1) from t1;

 count---

 5(1 row)

test=# select max(*) from t1;max- 1 (1 row)
test=# select max(1) from t1;
 max- 1 
(1 row)test=# select min(*) from t1;min- 1(1 row)test=# select avg(*) from t1;
 avg1.(1 row)test=# select sum(*) from t1;sum- 5(1 row)test=# select sum(1) from t1;sum- 5 --- this is correct
(1 row)test=#


Re: [HACKERS] Bug with UTF-8 character

2006-05-26 Thread Marko Kreen

On 5/26/06, Martijn van Oosterhout kleptog@svana.org wrote:

On Fri, May 26, 2006 at 08:21:56AM +0200, Hans-Jürgen Schönig wrote:
 I got a bug request for the following unicode character in PostgreSQL
 8.1.4: 0xedaeb8

 ERROR:  invalid byte sequence for encoding UTF8: 0xedaeb8



Your character converts to char DBB8. According to the standard,
characters in the range D800-DFFF are not characters but surrogates.
They don't mean anything by themselves and are thus rejected by
postgres.

http://www.unicode.org/faq/utf_bom.html#30

This character should be preceded by a low surrogate (D800-DBFF). You
should combine the two into a single 4-byte UTF-8 character.


You are talking about UTF16, not UTF8.

--
marko

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] XLogArchivingActive

2006-05-26 Thread Andreas Pflug

Tom Lane wrote:

Andreas Pflug [EMAIL PROTECTED] writes:


Tom Lane wrote:


I'm not sure you can expect that to work.  The system is not built to
guarantee instantaneous response to mode changes like that.



Um, as long as xlog writing stops immediate recycling when 
pg_start_backup is executed everything should be fine, since archived 
logs are not expected to be present until pg_stop_backup is done.



Wrong.  You forgot about all the *other* behaviors that change depending
on XLogArchivingActive, like whether CREATE INDEX gets archived or
just fsync'd.  I don't think it makes sense for CREATE INDEX to change
that behavior in midstream, even assuming that it noticed the flag
change instantly.


Ok, but how can I recognize whether all running commands have safely 
switched to archiving mode after enabling it, to continue backing up?


Thought a little about your proposal to use a non-copying 
archive_command, since I only want to have a backup of the state the 
cluster had when backup started, but this won't work because all write 
actions that are not appending (truncate, drop) would remove files 
needed for pre-backup state while possibly not backed up yet, thus the 
WAL archive is needed.
Following your proposal, I could redirect archiving to /dev/null while 
not backing up, but how can I make sure that WAL files of transactions, 
open when starting the backup procedure, are written to the wal 
directory, not lost previously? When pg_start_backup() is executed, I'd 
need the archiver to write all hot xlog files again.


Regards,
Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Bug with UTF-8 character

2006-05-26 Thread Tom Lane
=?windows-1252?Q?Hans-J=FCrgen_Sch=F6nig?= [EMAIL PROTECTED] writes:
 But the code does a check where the second character should not be 
 greater than 0x9F, when first character is 0xED. This is not according 
 to UTF-8 standard in RFC 3629.

Better read the RFC again: it says

   UTF8-3  = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
 %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
 

The reason for the prohibition is explained as

  The definition of UTF-8 prohibits encoding character numbers between
  U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding
  form (as surrogate pairs) and do not directly represent characters.

I don't know anything about surrogate pairs, but I am not about to
decide that we know more about this than the RFC authors do.  If they
say it's invalid, it's invalid.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Bug with UTF-8 character

2006-05-26 Thread Martijn van Oosterhout
On Fri, May 26, 2006 at 05:16:59PM +0300, Marko Kreen wrote:
 On 5/26/06, Martijn van Oosterhout kleptog@svana.org wrote:
 On Fri, May 26, 2006 at 08:21:56AM +0200, Hans-Jürgen Schönig wrote:
  I got a bug request for the following unicode character in PostgreSQL
  8.1.4: 0xedaeb8
 
  ERROR:  invalid byte sequence for encoding UTF8: 0xedaeb8
 
 Your character converts to char DBB8. According to the standard,
 characters in the range D800-DFFF are not characters but surrogates.
 They don't mean anything by themselves and are thus rejected by
 postgres.
 
 http://www.unicode.org/faq/utf_bom.html#30
 
 This character should be preceded by a low surrogate (D800-DBFF). You
 should combine the two into a single 4-byte UTF-8 character.
 
 You are talking about UTF16, not UTF8.

UTF-8 and UTF-16 use the same charater set as base, just the encoding
is different.

As that page says, to convert the surrogate pair in UTF-16 (D800 DC00)
to UTF-8, you have to combine them into a single 4-byte UTF-8
character. The direct encoding for D800 into UTF-8 is invalid because
no such character exists.

The OP apparently has some broken UTF-16 to UTF-8 conversion software
and thus produced invalid UTF-8, which postgres is rejecting. Given he
didn't post the other half of the surrogate, we don't actually know
what character he's trying to represent, so we can't help him with the
encoding. However, supplementary characters (which require surrogates
in UTF-16) are all in the range 0x1 to 0x10.

If you don't beleive me, check the unicode database yourself (warning
large: 944KB). 
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

DBB8 is a private use surrogate, maybe he should be using something in
the range E000-F8FF which are normal private use characters.

Have a ncie day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Updatable views/with check option parsing

2006-05-26 Thread Andrew Dunstan

Peter Eisentraut wrote:

Am Mittwoch, 24. Mai 2006 20:42 schrieb Tom Lane:
  

Peter Eisentraut [EMAIL PROTECTED] writes:


I have spent some time figuring out how to resolve the parsing conflicts
in Bernd Helmle's updatable views patch.  The problem has now been
reduced to specifically this situation:
  

Could we see the proposed patches for gram.y?



Here it is.

$ make -W gram.y gram.c
bison -y -d  gram.y
conflicts: 4 shift/reduce

These are basically for instances of the same problem.
  


I had a quick look - I don't think there is an easy answer with the 
current proposed grammar. If we want to prevent shift/reduce conflicts I 
suspect we'd need to use a different keyword than WITH, although I can't 
think of one that couldn't be a trailing clause on a select statment, 
which is the cause of the trouble. Another possibility would be to move 
the optional WITH clause so that it would come before the AS clause. 
Then there should be no conflict, I believe. Something like:


ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list
  opt_check_option AS SelectStmt

cheers

andrew




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Updatable views/with check option parsing

2006-05-26 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 I had a quick look - I don't think there is an easy answer with the 
 current proposed grammar. If we want to prevent shift/reduce conflicts I 
 suspect we'd need to use a different keyword than WITH, although I can't 
 think of one that couldn't be a trailing clause on a select statment, 
 which is the cause of the trouble. Another possibility would be to move 
 the optional WITH clause so that it would come before the AS clause. 

Unfortunately the SQL99 spec is perfectly clear about what it wants:

 view definition ::=
  CREATE [ RECURSIVE ] VIEW table name
view specification
AS query expression
[ WITH [ levels clause ] CHECK OPTION ]

 levels clause ::=
CASCADED
  | LOCAL

I haven't had time to play with this yet, but I suspect the answer will
have to be that we reinstate the token-merging UNION JOIN kluge that I
just took out :-(.  Or we could look into recognizing the whole thing as
one token in scan.l, but I suspect that doesn't work unless we give up
the no-backtrack property of the lexer, which would be more of a speed
hit than the intermediate function was.  Anyway it should certainly be
soluble with token merging, if we can't find a pure grammar solution.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] max(*)

2006-05-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Fri, 2006-05-26 at 10:22 +0200, Dennis Bjorklund wrote:
 Shouldn't
 SELECT max(*) FROM foo;
 give an error? 

 Doesn't seem an important or even useful extension of the standard, but
 would probably require special case processing for every aggregate
 function in order to implement that. Its not dangerous... so I'm not
 sure we should take any action at all.

We shouldn't.  The spec's prohibition is based on the assumption that
the only aggregate functions in existence are those listed in the spec.
Since we allow user-defined aggregates, who are we to say that there are
no others for which * is sensible?

You could imagine adding a catalog attribute to aggregate functions to
say whether they allow *, but quite honestly that strikes me as a
waste of implementation effort.  The amount of work would be nontrivial
and the benefit negligible.

(Another possibility, if we get around to implementing N-argument
aggregates, is to equate agg(*) to an invocation of a zero-argument
aggregate as I suggested awhile ago.  Then count() would be the only
zero-argument aggregate mentioned in the standard catalogs.  That would
at least fall out of some work that's actually worth doing ...)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Updatable views/with check option parsing

2006-05-26 Thread Martijn van Oosterhout
On Wed, May 24, 2006 at 01:13:06PM +0200, Peter Eisentraut wrote:
 CREATE VIEW foo AS SELECT expr :: TIME . WITH
 
 (where expr is a_expr or b_expr and TIME could also be TIMESTAMP or TIME(x) 
 or 
 TIMESTAMP(x)).
 
 The continuation here could be WITH TIME ZONE (calling for a shift) or WITH 
 CHECK OPTION (calling for a reduce).
 
 All the usual ideas about unfolding the rules or making keywords more 
 reserved 
 don't work (why should they).  A one-token lookahead simply can't parse this.
 
 I have had some ideas about trying to play around with the precedence rules 
 -- 
 giving WITH TIME ZONE a higher precedence than WITH CHECK OPTION -- but I 
 have no experience with that and I am apparently not doing it right, if that 
 is supposed to work at all.

All precedence rules do is force the parser to either shift or reduce
without complaining about a conflict. i.e. it resolves the conflict by
forcing a particular option.

So all you would acheive with precedence rules is that you codify the
solution. For example: always shift and if the user specifies WITH
CHECK they get a parse error. It would be nice to be able to detect
this and tell the user to parenthesise the (expr::TIME) thus solving
the problem. Given that :: is not SQL-standard anyway, perhaps this is
not a bad solution.

Incidently, IIRC the default behaviour on conflict is a shift anyway,
so that what the patch already does anyway.

So we get:

CREATE VIEW foo AS SELECT expr :: TIME WITH TIME ZONE-- OK
CREATE VIEW foo AS SELECT expr :: TIME WITH CHECK OPTION -- parse error
CREATE VIEW foo AS SELECT (expr :: TIME) WITH CHECK OPTION   -- OK

Of course, any code that decompiles into SQL will have to be careful to
not produce unparseable SQL.

Have a ncie day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Inefficient bytea escaping?

2006-05-26 Thread Andreas Pflug

Tom Lane wrote:

Andreas Pflug [EMAIL PROTECTED] writes:


Tom Lane wrote:


Looking at CopySendData, I wonder whether any traction could be gained
by trying not to call fwrite() once per character.  I'm not sure how
much per-call overhead there is in that function.  We've done a lot of
work trying to optimize the COPY IN path since 8.0, but nothing much
on COPY OUT ...



Hm, I'll see whether I can manage to check CVS head too, and see what's 
happening, not a production alternative though.



OK, make sure you get the copy.c version I just committed ...


Here are the results, with the copy patch:

psql \copy 1.4 GB from table, binary:
8.0 8.1 8.2dev
36s 34s 36s

psql \copy 1.4 GB to table, binary:
8.0 8.1 8.2dev
106s95s 98s

psql \copy 6.6 GB from table, std:
8.0 8.1 8.2dev
375s362s290s (second:283s)

psql \copy 6.6 GB to table, std:
8.0 8.1 8.2dev
511s230s238s

INSERT INTO foo SELECT * FROM bar
8.0 8.1 8.2dev
75s 75s 75s

So obviously text COPY is enhanced by 20 % now, but it's still far from 
the expected throughput. The dump disk should be capable of 60MB/s, 
limiting text COPY to about 110 seconds, but the load process is CPU 
restricted at the moment.


For comparision purposes, I included the in-server copy benchmarks as 
well (bytea STORAGE EXTENDED; EXTERNAL won't make a noticable 
difference). This still seems slower than expected to me, since the 
table's on-disk footage is relatively small (138MB).


Regards,
Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Compression and on-disk sorting

2006-05-26 Thread Jim C. Nasby
I've done some more testing with Tom's recently committed changes to
tuplesort.c, which remove the tupleheaders from the sort data. It does
about 10% better than compression alone does. What's interesting is that
the gains are about 10% regardless of compression, which means
compression isn't helping very much on all the redundant header data,
which I find very odd. And the header data is very redundant:

bench=# select xmin,xmax,cmin,cmax,aid from accounts order by aid limit 1;
  xmin  | xmax | cmin | cmax | aid
+--+--+--+-
 280779 |0 |0 |0 |   1
(1 row)

bench=# select xmin,xmax,cmin,cmax,aid from accounts order by aid desc limit 1;
  xmin  | xmax | cmin | cmax |aid
+--+--+--+---
 310778 |0 |0 |0 | 3
(1 row)

Makes sense, since pgbench loads the database via a string of COPY commands,
each of which loads 1 rows.

Something else worth mentioning is that sort performance is worse with
larger work_mem for all cases except the old HEAD, prior to the
tuplesort.c changes. It looks like whatever was done to fix that will
need to be adjusted/rethought pending the outcome of using compression.

In any case, compression certainly seems to be a clear win, at least in
this case. If there's interest, I can test this on some larger hardware,
or if someone wants to produce a patch for pgbench that will load some
kind of real data into accounts.filler, I can test that as well.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Compression and on-disk sorting

2006-05-26 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 Something else worth mentioning is that sort performance is worse with
 larger work_mem for all cases except the old HEAD, prior to the
 tuplesort.c changes. It looks like whatever was done to fix that will
 need to be adjusted/rethought pending the outcome of using compression.

Please clarify.  What are you comparing here exactly, and what cases did
you test?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] LIKE, leading percent, bind parameters and indexes

2006-05-26 Thread Martijn van Oosterhout
On Fri, May 26, 2006 at 11:38:41AM -0500, Jim C. Nasby wrote:
  select * from table where field like 'THE NAME%'; -- index scan
  select * from table where field like '%THE NAME%'; -- seq scan
  select * from table where field like :bind_param; -- seq scan (always)
 
 How difficult would it be to make LIKE check the value of the bound
 parameter for a starting % and use that information to decide on a query
 plan? IMHO this is worth making into a special case in the planner,
 because it's very easy to detect and makes a tremendous difference in
 the query plan/performance.

Planning doesn't work that way. Like is just a function invokation, the
planner doesn't ask or tell it where it is in the plan. And it's not
the function that determines how the query is planned.

 Also, might a bitmap scan be a win for the %string case? Presumably it's
 much faster to find matching rows via an index and then go back into the
 heap for them; unless you're matching a heck of a lot of rows.

This is an interesting thought. Currently, AFAICS, the bitmap-scan code
only considers operators that are indexable, just like for narmal index
scans. However, in this case the query could scan the entire index,
apply the LIKE to each one and produce a bitmap of possible matches.
Then do a bitmap scan over the table to check the results.

Not just LIKE could use this, but any function marked STABLE. You'd
have to weigh up the cost of scanning the *entire* index (because we
don't have any actual restriction clauses) against avoiding a full
table scan.

Actually, if you're going to scan the whole index, maybe you can use
the recent changes that allow VACUUM to scan the index sequentially,
rather than by index order. Surely a sequential disk scan over the
index to create the bitmap would a big win.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


[HACKERS] Creating a case insensitive data type

2006-05-26 Thread Dave
could CREATE TYPE be used to make a case insensitive version of varchar? 
So that doing something like LOWER(username) = 'joe' could just be done 
like username = 'joe' ?


I want to try to avoid using CREATE TYPE with C extensions and using a 
new operator all over the place is not an option... How could I do this? 
Can I accomplish this via another method?


-Dave

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Inefficient bytea escaping?

2006-05-26 Thread Tom Lane
Andreas Pflug [EMAIL PROTECTED] writes:
 Here are the results, with the copy patch:

 psql \copy 1.4 GB from table, binary:
 8.0   8.1 8.2dev
 36s   34s 36s

 psql \copy 6.6 GB from table, std:
 8.0   8.1 8.2dev
 375s  362s290s (second:283s)

Hmph.  There's something strange going on on your platform (what is it
anyway?)  Using CVS HEAD on Fedora Core 4 x86_64, I get

bytea=# copy t to '/home/tgl/t.out';
COPY 1024
Time: 273325.666 ms
bytea=# copy binary t to '/home/tgl/t.outb';
COPY 1024
Time: 62113.355 ms

Seems \timing doesn't work on \copy (annoying), so

$ time psql -c \\copy t to '/home/tgl/t.out2' bytea

real3m47.507s
user0m3.700s
sys 0m36.406s
$ ls -l t.*
-rw-r--r--  1 tgl tgl 5120001024 May 26 12:58 t.out
-rw-rw-r--  1 tgl tgl 5120001024 May 26 13:14 t.out2
-rw-r--r--  1 tgl tgl 1024006165 May 26 13:00 t.outb
$

This test case is 1024 rows each containing a 100-byte bytea, stored
EXTERNAL (no on-disk compression), all bytes chosen to need expansion to
\nnn form.  So the ratio in runtimes is in keeping with the amount of
data sent.  It's interesting (and surprising) that the runtime is
actually less for psql \copy than for server COPY.  This is a dual Xeon
machine, maybe the frontend copy provides more scope to use both CPUs?

It would be interesting to see what's happening on your machine with
oprofile or equivalent.

I can't test psql binary \copy just yet, but will look at applying your
recent patch so that case can be checked.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Updatable views/with check option parsing

2006-05-26 Thread Greg Stark

Martijn van Oosterhout kleptog@svana.org writes:

 So we get:
 
 CREATE VIEW foo AS SELECT expr :: TIME WITH TIME ZONE-- OK
 CREATE VIEW foo AS SELECT expr :: TIME WITH CHECK OPTION -- parse error
 CREATE VIEW foo AS SELECT (expr :: TIME) WITH CHECK OPTION   -- OK

I haven't really been following this conversation, but just on the off chance
this is a useful idea: Would it work to make WITH just a noise word? then
you would just need one token of look-ahead to recognize TIME ZONE or CHECK
OPTION instead of 2. I don't know what levels clauses look like so I'm not
sure if you would be able to recognize them without seeing the WITH. I'm not
even sure this works even if you can for that matter.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Inefficient bytea escaping?

2006-05-26 Thread Andreas Pflug

Tom Lane wrote:

Andreas Pflug [EMAIL PROTECTED] writes:


Here are the results, with the copy patch:




psql \copy 1.4 GB from table, binary:
8.0 8.1 8.2dev
36s 34s 36s




psql \copy 6.6 GB from table, std:
8.0 8.1 8.2dev
375s362s290s (second:283s)



Hmph.  There's something strange going on on your platform (what is it
anyway?)


Debian 2.6.26.


 It's interesting (and surprising) that the runtime is
actually less for psql \copy than for server COPY.  This is a dual Xeon
machine, maybe the frontend copy provides more scope to use both CPUs?


The dual CPU explanation sounds reasonable, but I found the same 
tendency on a single 3GHz (HT disabled).

Strange observation using top:
user 90%, sys 10%, idle+wait 0% but only postmaster consumes cpu, 
showing 35%, the rest neglectable.


It would be interesting to see what's happening on your machine with
oprofile or equivalent.


I'll investigate further, trying to find the missing CPU.

Regards,
Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Creating a case insensitive data type

2006-05-26 Thread Martijn van Oosterhout
On Thu, May 25, 2006 at 08:23:53PM -0400, Dave wrote:
 could CREATE TYPE be used to make a case insensitive version of varchar? 
 So that doing something like LOWER(username) = 'joe' could just be done 
 like username = 'joe' ?
 
 I want to try to avoid using CREATE TYPE with C extensions and using a 
 new operator all over the place is not an option... How could I do this? 
 Can I accomplish this via another method?

Try this:

http://gborg.postgresql.org/project/citext/projdisplay.php

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] max(*)

2006-05-26 Thread Jim C. Nasby
On Fri, May 26, 2006 at 11:03:17AM -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Fri, 2006-05-26 at 10:22 +0200, Dennis Bjorklund wrote:
  Shouldn't
  SELECT max(*) FROM foo;
  give an error? 
 
  Doesn't seem an important or even useful extension of the standard, but
  would probably require special case processing for every aggregate
  function in order to implement that. Its not dangerous... so I'm not
  sure we should take any action at all.
 
 We shouldn't.  The spec's prohibition is based on the assumption that
 the only aggregate functions in existence are those listed in the spec.
 Since we allow user-defined aggregates, who are we to say that there are
 no others for which * is sensible?

But if aggregate(*) just gets turned into aggregate(1) by the backend,
why not just tell people to use aggregate(1) for their custom
aggregates? Or am I misunderstanding how aggregate(*) is actually
handled?

My concern is that it's not inconceiveable to typo max(field) into
max(*), which could make for a rather frustrating error. Not to mention
this being something that could trip newbies up. If nothing else I'd say
it warrants a %TODO just so it doesn't end up on the PostgreSQL gotcha's
page. :)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] max(*)

2006-05-26 Thread Bruno Wolff III
On Fri, May 26, 2006 at 14:06:29 -0500,
  Jim C. Nasby [EMAIL PROTECTED] wrote:
 
 But if aggregate(*) just gets turned into aggregate(1) by the backend,
 why not just tell people to use aggregate(1) for their custom
 aggregates? Or am I misunderstanding how aggregate(*) is actually
 handled?
 
 My concern is that it's not inconceiveable to typo max(field) into
 max(*), which could make for a rather frustrating error. Not to mention
 this being something that could trip newbies up. If nothing else I'd say
 it warrants a %TODO just so it doesn't end up on the PostgreSQL gotcha's
 page. :)

Tom's suggestion that (*) map to () which would refer to a zero argument
aggregate would cover this case, since there wouldn't be a zero argument
version of max.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] max(*)

2006-05-26 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 My concern is that it's not inconceiveable to typo max(field) into
 max(*), which could make for a rather frustrating error. Not to mention
 this being something that could trip newbies up. If nothing else I'd say
 it warrants a %TODO just so it doesn't end up on the PostgreSQL gotcha's
 page. :)

count(*) has been implemented that way since about 1999, and no one's
complained yet, so I think you are overstating the importance of the
problem.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Compression and on-disk sorting

2006-05-26 Thread Jim C. Nasby
On Fri, May 26, 2006 at 12:35:36PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  Something else worth mentioning is that sort performance is worse with
  larger work_mem for all cases except the old HEAD, prior to the
  tuplesort.c changes. It looks like whatever was done to fix that will
  need to be adjusted/rethought pending the outcome of using compression.
 
 Please clarify.  What are you comparing here exactly, and what cases did
 you test?

Sorry, forgot to put the url in:
http://jim.nasby.net/misc/pgsqlcompression/compress_sort.txt

But the meat is:
-- work_mem --
Scale   20002
not compressed  150 805.7   797.7
not compressed  300017820   17436
compressed  150 371.4   400.1
compressed  300081528537
compressed, no headers  300073257876

Performance degrades with more work_mem any time compression is used. I
thought I had data on just your tuplesort.c change without compression,
but I guess I don't. :( I can run that tonight if desired.

As for the code, the 3 things I've tested are HEAD as of 5/17/06 with no
patches (labeld 'not compressed'); that code with the compression patch
(compressed), and that code with both the compression patch and your change to
tuplesort.c that removes tuple headers from the sorted data (compressed, no
headers).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] XLogArchivingActive

2006-05-26 Thread Bruce Momjian

Originally I wanted the command to be a string, and archiving to be a
boolean, but Tom wanted a single parameter, and others agreed.

---

Andreas Pflug wrote:
 Simon Riggs wrote:
  On Thu, 2006-05-25 at 17:25 +0200, Andreas Pflug wrote:
  
 
  
 Currently, I have to 
 edit postgresql.conf and SIGHUP to turn on archiving configuring a 
 (hopefully) writable directory, do the backup, edit postgresql.conf and 
 SIGHUP again. Not too convenient...
  
  
  You're doing this for pgAdmin right?
 
 Not yet, just trying to manage a server.
 
  
  My understanding was that we had the tools now to edit the
  postgresql.conf programmatically? 
  
  Seems like its not too convenient to change the way the server operates
  to do this, as long as we solve the SIGHUP/postgresql.conf problem. I'm
  also not that happy about curtailing people's options on backup either:
  if people decided they wanted to have a mixture of isolated on-line
  backup (as you suggest), plus active archiving at other times they would
  still have the problems you suggest.
 
 Why?
 My suggestion is to redefine XLogArchivingActive. Currently, it tests 
 for non-null archive_command. I propose
 bool XlogArchivingActive()
 {
 if (XLogArchiveCommand[0] == 0)
return false;
 return (XLogPermanentArchive // from GUC
|| OnlineBackupRunning()); // from pg_start_backup
 }
 
 The people you mention simply have XLogPermanentActive=true in 
 postgresql.conf, delivering the current behaviour.
 
  
  Not sure what the edit commands are offhand, but we would need the
  following program:
  
  - edit postgresql.conf
  - pg_reload_conf()
  - wait 30
  - pg_start_backup('blah')
  - backup
  - pg_stop_backup()
  - unedit postgresql.conf
  - pg_reload_conf()
  
  Which could then be wrapped even more simply as
  
  - pg_start_backup_online('blah')
  - backup
  - pg_stop_backup_online()
 
 Editing postgresql.conf for this is ugly. In addition, 
 pg_start_backup_online would need an additional parameter, the (highly 
 machine specific) archive_command string. I'd like to see that parameter 
 untouched in postgresql.conf.
 
 Regards,
 Andreas
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend
 

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] XLogArchivingActive

2006-05-26 Thread Jim C. Nasby
On Fri, May 26, 2006 at 12:15:34AM +0200, Andreas Pflug wrote:
 Jim Nasby wrote:
 Another consideration is that you can use rsync to update a 
 filesystem-level backup, but there's no pg_dump equivalent. On a large 
 database that can make a sizable difference in the amount of time 
 required for a backup.
 That's fine to cut the backup execution time, but to guarantee 
 consistency while the cluster is running pg_start_backup/pg_stop_backup 
 and WAL archiving will still be necessary.

Of course, but the point is that it would only be necessary while you're
running rsync. If you don't care about being able to actually roll
forward from that backup, you don't need any WAL files from after rsync
completes.

One possible way to accomplish this would be to allow specifying an
archiver command to pg_start_backup, which would then fire up an
archiver for the duration of your backup. Then you can:

SELECT pg_start_backup('label', 'cp -i %p /mnt/server/archivedir/%f
/dev/null');
rsync
SELECT pg_stop_backup();

No messing with postgresql.conf, no need to HUP the postmaster.

Perhaps the OP would be interested in coding this up, or sponsoring
someone to do so, since I think it provide what they were looking for.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Inefficient bytea escaping?

2006-05-26 Thread Tom Lane
I wrote:
 I can't test psql binary \copy just yet, but will look at applying your
 recent patch so that case can be checked.

With patch applied:

$ time psql -c \\copy t to '/home/tgl/t.out2' bytea

real3m46.057s
user0m2.724s
sys 0m36.118s
$ time psql -c \\copy t to '/home/tgl/t.outb2' binary bytea

real1m5.222s
user0m0.640s
sys 0m6.908s
$ ls -l t.*
-rw-rw-r--  1 tgl tgl 5120001024 May 26 16:02 t.out2
-rw-rw-r--  1 tgl tgl 1024006165 May 26 16:03 t.outb2

The binary time is just slightly more than what I got before for a
server COPY:

bytea=# copy t to '/home/tgl/t.out';
COPY 1024
Time: 273325.666 ms
bytea=# copy binary t to '/home/tgl/t.outb';
COPY 1024
Time: 62113.355 ms

So those numbers seem to hang together, and it's just the text case
that is not making too much sense.  I'm off for a little visit with
oprofile...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] XLogArchivingActive

2006-05-26 Thread Jim C. Nasby
On Fri, May 26, 2006 at 10:59:37AM +0100, Simon Riggs wrote:
 Not sure what the edit commands are offhand, but we would need the
 following program:
 
 - edit postgresql.conf
 - pg_reload_conf()
 - wait 30
 - pg_start_backup('blah')

Rather than 'wait 30', ISTM it would be better to just leave archiving
enabled, but not actually archiving WAL files.

Or, setup some mechanism so that you can tell if any commands who's
behavior would change based on archiving are running, and if any of
those that are running think archiving is disabled,
pg_start_backup_online blocks on them.

Also, regarding needing to place an archiver command in
pg_start_backup_online, another option would be to depend on the
filesystem backup to copy the WAL files, and just let them pile up in
pg_xlog until pg_stop_backup_online. Of course, that would require a
two-step filesystem copy, since you'd need to first copy everything in
$PGDATA, and then copy $PGDATA/pg_xlog after you have that.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] GIN stuck in loop during PITR

2006-05-26 Thread Andreas Seltenreich
Andreas Seltenreich schrob:

 Teodor Sigaev schrob:

 Thanks a lot, applied. Can you describe test suite? It may be useful
 for test more...

 Here's a shell script that triggers the bug when I revert the patch.

Just tried the script on HEAD, and it was triggering an assertion. I
guess it is because we are still returning InvalidOffsetNumber in the
trivial case (looks like a typo to me). I've attached a patch.

regards,
andreas

Index: ginbtree.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/gin/ginbtree.c,v
retrieving revision 1.2
diff -c -r1.2 ginbtree.c
*** ginbtree.c  26 May 2006 08:01:17 -  1.2
--- ginbtree.c  26 May 2006 20:09:45 -
***
*** 189,195 
Assert( !GinPageIsLeaf(page) );
  
/* check trivial case */
!   if ( (root-off != btree-findChildPtr(btree, page, stack-blkno, 
InvalidOffsetNumber)) != InvalidBuffer ) {
stack-parent = root;
return;
}
--- 189,195 
Assert( !GinPageIsLeaf(page) );
  
/* check trivial case */
!   if ( (root-off = btree-findChildPtr(btree, page, stack-blkno, 
InvalidOffsetNumber)) != InvalidOffsetNumber ) {
stack-parent = root;
return;
}

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Compression and on-disk sorting

2006-05-26 Thread Simon Riggs
On Fri, 2006-05-26 at 14:47 -0500, Jim C. Nasby wrote:

 But the meat is:
 -- work_mem --
 Scale   20002
 not compressed  150 805.7   797.7
 not compressed  300017820   17436
 compressed  150 371.4   400.1
 compressed  300081528537
 compressed, no headers  300073257876

Since Tom has committed the header-removing patch, we need to test

not compressed, no headers v compressed, no headers

There is a noticeable rise in sort time with increasing work_mem, but
that needs to be offset from the benefit that in-general comes from
using a large Heap for the sort. With the data you're using that always
looks like a loss, but that isn't true with all input data orderings.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Compression and on-disk sorting

2006-05-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 There is a noticeable rise in sort time with increasing work_mem, but
 that needs to be offset from the benefit that in-general comes from
 using a large Heap for the sort. With the data you're using that always
 looks like a loss, but that isn't true with all input data orderings.

Yeah, these are all the exact same test data, right?  We need a bit more
variety in the test cases before drawing any sweeping conclusions.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Compression and on-disk sorting

2006-05-26 Thread Jim C. Nasby
On Fri, May 26, 2006 at 04:41:51PM -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  There is a noticeable rise in sort time with increasing work_mem, but
  that needs to be offset from the benefit that in-general comes from
  using a large Heap for the sort. With the data you're using that always
  looks like a loss, but that isn't true with all input data orderings.
 
 Yeah, these are all the exact same test data, right?  We need a bit more
 variety in the test cases before drawing any sweeping conclusions.

All testing is select count(*) from (select * from accounts order by
bid) a; hitting a pgbench database, since that's something anyone can
(presumably) reproduce. Suggestions for other datasets welcome.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Question about name datatype

2006-05-26 Thread Greg Stark

In c.h There is the following comment:

/*
 * We want NameData to have length NAMEDATALEN and int alignment,
 * because that's how the data type 'name' is defined in pg_type.
 * Use a union to make sure the compiler agrees.  Note that NAMEDATALEN
 * must be a multiple of sizeof(int), else sizeof(NameData) will probably
 * not come out equal to NAMEDATALEN.
 */


And indeed in pg_type the typalign column says i for this data type. My
question is just, why? What would be the problem with an alignment of 1 for
name?

-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Updatable views/with check option parsing

2006-05-26 Thread Peter Eisentraut
Andrew Dunstan wrote:
 The GLR parsers require a compiler for ISO C89 or later. In
 addition, they use the inline keyword, which is not C89, but is C99
 and is a common extension in pre-C99 compilers. It is up to the user
 of these parsers to handle portability issues.

We already use inline, or handle its nonexistence, respectively.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Inefficient bytea escaping?

2006-05-26 Thread Tom Lane
I wrote:
 I'm off for a little visit with oprofile...

It seems the answer is that fwrite() does have pretty significant
per-call overhead, at least on Fedora Core 4.  The patch I did yesterday
still ended up making an fwrite() call every few characters when dealing
with bytea text output, because it'd effectively do two fwrite()s per
occurrence of '\' in the data being output.  I've committed a further
hack that buffers a whole data row before calling fwrite().  Even though
this presumably is adding one extra level of data copying, it seems to
make things noticeably faster:

bytea=# copy t to '/home/tgl/t.out';
COPY 1024
Time: 209842.139 ms

as opposed to 268 seconds before.  We were already applying the
line-at-a-time buffering strategy for frontend copies, so that
path didn't change much (it's about 226 seconds for the same case).

At this point, a copy-to-file is just marginally faster than a
frontend copy happening on the local machine; which speaks well
for the level of optimization of the Linux send/recv calls.
More importantly, I see consistent results for the text and
binary cases.

Let me know what this does on your Debian machine ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] LIKE, leading percent, bind parameters and indexes

2006-05-26 Thread Mark Woodward
 On Thu, May 25, 2006 at 08:41:17PM -0300, Rodrigo Hjort wrote:
 
 I think more exactly, the planner can't possibly know how to plan an
 indexscan with a leading '%', because it has nowhere to start.
 

 The fact is that index scan is performed on LIKE expression on a string
 not
 preceded by '%', except when bound parameter is used.

 select * from table where field like 'THE NAME%'; -- index scan
 select * from table where field like '%THE NAME%'; -- seq scan
 select * from table where field like :bind_param; -- seq scan (always)

 Since I'm somewhat doubtful of coming up with a generic means for
 dealing with plan changes based on different bound parameter values any
 time soon...

 How difficult would it be to make LIKE check the value of the bound
 parameter for a starting % and use that information to decide on a query
 plan? IMHO this is worth making into a special case in the planner,
 because it's very easy to detect and makes a tremendous difference in
 the query plan/performance.


My solution is a function in one of my libraries called strrev() which
returns the reverse of a string. I make a function index of a
strrev(field). Then, just search where strrev('%the name') like
strrev(field);



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Question about name datatype

2006-05-26 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 And indeed in pg_type the typalign column says i for this data type. My
 question is just, why? What would be the problem with an alignment of 1 for
 name?

Probably none, but that's how it's been defined since day one, and there
doesn't seem any good reason to change.  (Looking at the system catalog
definitions, it doesn't appear that we'd save anything on alignment.)

It's possible that back when that decision was taken, there was
something in the bootstrap code that wouldn't work with non-int-aligned
columns?  Just a guess.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Updatable views/with check option parsing

2006-05-26 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Andrew Dunstan wrote:
 The GLR parsers require a compiler for ISO C89 or later. In
 addition, they use the inline keyword, which is not C89, but is C99
 and is a common extension in pre-C99 compilers. It is up to the user
 of these parsers to handle portability issues.

 We already use inline, or handle its nonexistence, respectively.

Yeah, I don't see anything in that statement that we don't assume
already.  The interesting question to me is how much different is
GLR from garden-variety bison; in particular, what's the parser
performance like?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] max(*)

2006-05-26 Thread Robert Treat
On Friday 26 May 2006 09:45, Gurjeet Singh wrote:
 Something not supported should be stated as such through an ERROR. Except
 for count(), none of the following make any sense:

 The transcipt:

 test=# \d t1
   Table public.t1
 Column |  Type   | Modifiers
 +-+---
 a  | integer | not null
 Indexes:
 t1_pkey PRIMARY KEY, btree (a)

 test=# select * from t1;
  a
 ---
 1
 2
 3
 4
 5
 (5 rows)


given:

pagila=# select 1 from t1;
 ?column?
--
1
1
1
1
1
(5 rows)

 test=# select count(*) from t1;
 count
 ---
  5
 (1 row)


this makes sense

 test=# select count(1) from t1;
  count
 ---
  5
 (1 row)


and so does this

 test=# select max(*) from t1;
  max
 -
1
 (1 row)


not so much 

 test=# select max(1) from t1;
 max
 -
1
 (1 row)


but this does

 test=# select min(*) from t1;
  min
 -
1
 (1 row)


not here though

 test=# select avg(*) from t1;
   avg
 
  1.
 (1 row)


nor here 

 test=# select sum(*) from t1;
  sum
 -
5
 (1 row)


or here

 test=# select sum(1) from t1;
  sum
 -
5  --- this is correct
 (1 row)

 test=#

yep... but really most aggregates are ok with a 1 

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Updatable views/with check option parsing

2006-05-26 Thread Andrew Dunstan
Tom Lane said:
 Peter Eisentraut [EMAIL PROTECTED] writes:
 Andrew Dunstan wrote:
 The GLR parsers require a compiler for ISO C89 or later. In
 addition, they use the inline keyword, which is not C89, but is C99
 and is a common extension in pre-C99 compilers. It is up to the user
 of these parsers to handle portability issues.

 We already use inline, or handle its nonexistence, respectively.

 Yeah, I don't see anything in that statement that we don't assume
 already.  The interesting question to me is how much different is
 GLR from garden-variety bison; in particular, what's the parser
 performance like?


As I understand it, it runs one parser pretty much like the standard LALR(1)
case, until it finds an ambiguity (shift/reduce or reduce/reduce) at which
stage it clones the parser to take parallel paths, working in lockstep, and
storing up semantic actions. When one of the clones encounters an error it
goes away, and the surviving clone takes its stored semantic actions. If
that's true, then probably the only significant performance hit is in cases
of ambiguity, and we would only have a handful of those, each lasting for
one token, so the performance hit should be very small. We'd have to test
it, of course ;-)

cheers

andrew




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Updatable views/with check option parsing

2006-05-26 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 As I understand it, it runs one parser pretty much like the standard LALR(1)
 case, until it finds an ambiguity (shift/reduce or reduce/reduce) at which
 stage it clones the parser to take parallel paths, working in lockstep, and
 storing up semantic actions. When one of the clones encounters an error it
 goes away, and the surviving clone takes its stored semantic actions. If
 that's true, then probably the only significant performance hit is in cases
 of ambiguity, and we would only have a handful of those, each lasting for
 one token, so the performance hit should be very small. We'd have to test
 it, of course ;-)

Yeah, I just read the same in the bison manual.  The thing that's
bothering me is that a GLR parser would hide that ambiguity from you,
and thus changes in the grammar might cause us to incur performance
hits without realizing it.  The manual indicates that the performance
is pretty awful whenever an ambiguity does occur.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Updatable views/with check option parsing

2006-05-26 Thread Peter Eisentraut
Tom Lane wrote:
 Yeah, I just read the same in the bison manual.  The thing that's
 bothering me is that a GLR parser would hide that ambiguity from you,

It doesn't really hide it.  You still get the N shift/reduce conflicts 
warnings from bison.  You just know that they are being handled.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Updatable views/with check option parsing

2006-05-26 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Yeah, I just read the same in the bison manual.  The thing that's
 bothering me is that a GLR parser would hide that ambiguity from you,

 It doesn't really hide it.  You still get the N shift/reduce conflicts 
 warnings from bison.  You just know that they are being handled.

Well, that has the same problem that we've raised every other time
someone has said I don't want to fix the grammar to not have any
conflicts.  If bison only tells you there were N conflicts,
how do you know these are the same N conflicts you had yesterday?
In a grammar that we hack around as much as we do with Postgres,
I really don't think that's acceptable.

I think that by far the most reliable solution is to put back the
filter yylex function that I removed a couple months ago:
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/parser/parser.c.diff?r1=1.64r2=1.65
We can use the same technique that we used for UNION JOIN, but instead
join, say, WITH and TIME into one token and make the time datatype
productions look for TIME WITHTIME ZONE and so on.  (I propose this
rather than putting the ugliness into WITH CHECK OPTION, because this
way we should only need one merged token and thus only one case to
check in the filter function; AFAICS we'd need three cases if we
merge tokens on that end of it.)

I'm not sure we can just revert the above-mentioned patch, because it
had some interactions with a later patch to use %option prefix.
Shouldn't be too hard to fix though.  I'll put together a proposed
patch.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] pg_proc probin misuse

2006-05-26 Thread James William Pye
Hi,

In PL/Py, I had the bright idea of storing bytecode in the probin field of the
function's pg_proc row. However, this idea has lately become rather dim as I
have recently rediscovered(thanks Adrian) that this breaks dumps; pg_dump 
outputs
a PL/Py function as CREATE FUNCTION x() RETURNS y LANGUAGE python AS
'bytecode', 'source'. Of course, when loading this, it fails:
'ERROR:  only one AS item needed for language python'.

So is this fix your broken PL or pg_dump should only be doing that for C
language functions? I imagine the former, so if that is the case perhaps
the 'probin' column description at [1] should be reworded to ensure others don't
get the same bright idea(the language specific part in particular).
Ugh, even if it were the latter, I would still be breaking existing versions,
so I'm inclined to fix it regardless..

Have a good evening (afternoon, morning, etc :).

[1] http://www.postgresql.org/docs/8.1/static/catalog-pg-proc.html
[Yeah, I do see the clarification at the bottom of the page. :( ]

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Updatable views/with check option parsing

2006-05-26 Thread Tom Lane
I wrote:
 We can use the same technique that we used for UNION JOIN, but instead
 join, say, WITH and TIME into one token and make the time datatype
 productions look for TIME WITHTIME ZONE and so on.  (I propose this
 rather than putting the ugliness into WITH CHECK OPTION, because this
 way we should only need one merged token and thus only one case to
 check in the filter function; AFAICS we'd need three cases if we
 merge tokens on that end of it.)

On investigation that turns out to have been a bad idea: if we do it
that way, it becomes necessary to promote WITH to a fully reserved word.
The counterexample is

CREATE VIEW v AS SELECT * FROM foo WITH ...

Is WITH an alias for foo (with no AS), or is it the start of a WITH
CHECK OPTION?  No way to tell without lookahead.

While I don't think that making WITH a fully reserved word would cause
any great damage, I'm unwilling to do it just to save a couple of lines
of code.  Accordingly, I propose the attached patch.  This reinstates
the filter yylex function formerly used for UNION JOIN (in a slightly
cleaner fashion than it was previously done) and parses WITH CHECK
OPTION without any bison complaints, and with no new reserved words.

If no objections, I'll go ahead and apply this, and Peter can get on
with making the stub productions do something useful.

regards, tom lane

Index: src/backend/parser/Makefile
===
RCS file: /cvsroot/pgsql/src/backend/parser/Makefile,v
retrieving revision 1.43
diff -c -r1.43 Makefile
*** src/backend/parser/Makefile 7 Mar 2006 01:00:17 -   1.43
--- src/backend/parser/Makefile 27 May 2006 02:39:51 -
***
*** 57,63 
  
  
  # Force these dependencies to be known even without dependency info built:
! gram.o keywords.o: $(srcdir)/parse.h
  
  
  # gram.c, parse.h, and scan.c are in the distribution tarball, so they
--- 57,63 
  
  
  # Force these dependencies to be known even without dependency info built:
! gram.o keywords.o parser.o: $(srcdir)/parse.h
  
  
  # gram.c, parse.h, and scan.c are in the distribution tarball, so they
Index: src/backend/parser/gram.y
===
RCS file: /cvsroot/pgsql/src/backend/parser/gram.y,v
retrieving revision 2.544
diff -c -r2.544 gram.y
*** src/backend/parser/gram.y   30 Apr 2006 18:30:39 -  2.544
--- src/backend/parser/gram.y   27 May 2006 02:39:52 -
***
*** 70,75 
--- 70,81 
(Current) = (Rhs)[0]; \
} while (0)
  
+ /*
+  * The %name-prefix option below will make bison call base_yylex, but we
+  * really want it to call filtered_base_yylex (see parser.c).
+  */
+ #define base_yylex filtered_base_yylex
+ 
  extern List *parsetree;   /* final parse result is 
delivered here */
  
  static bool QueryIsRule = FALSE;
***
*** 339,344 
--- 345,351 
  %type list  constraints_set_list
  %type boolean constraints_set_mode
  %type str   OptTableSpace OptConsTableSpace OptTableSpaceOwner
+ %type list  opt_check_option
  
  
  /*
***
*** 356,362 
BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
BOOLEAN_P BOTH BY
  
!   CACHE CALLED CASCADE CASE CAST CHAIN CHAR_P
CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
COMMITTED CONNECTION CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY 
CREATE CREATEDB
--- 363,369 
BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
BOOLEAN_P BOTH BY
  
!   CACHE CALLED CASCADE CASCADED CASE CAST CHAIN CHAR_P
CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT
COMMITTED CONNECTION CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY 
CREATE CREATEDB
***
*** 431,436 
--- 438,449 
  
ZONE
  
+ /* The grammar thinks these are keywords, but they are not in the keywords.c
+  * list and so can never be entered directly.  The filter in parser.c
+  * creates these tokens when required.
+  */
+ %tokenWITH_CASCADED WITH_LOCAL WITH_CHECK
+ 
  /* Special token types, not actually keywords - see the lex file */
  %token str  IDENT FCONST SCONST BCONST XCONST Op
  %token ival ICONST PARAM
***
*** 4618,4629 
  /*
   *
   *QUERY:
!  *CREATE [ OR REPLACE ] [ TEMP ] VIEW viewname '('target-list 
')' AS query
   *
   
*/
  
  ViewStmt: CREATE OptTemp VIEW qualified_name opt_column_list
!   AS SelectStmt
{
ViewStmt *n = makeNode(ViewStmt);
 

Re: [HACKERS] pg_proc probin misuse

2006-05-26 Thread Tom Lane
James William Pye [EMAIL PROTECTED] writes:
 In PL/Py, I had the bright idea of storing bytecode in the probin field of the
 function's pg_proc row. However, this idea has lately become rather dim as I
 have recently rediscovered(thanks Adrian) that this breaks dumps; pg_dump 
 outputs
 a PL/Py function as CREATE FUNCTION x() RETURNS y LANGUAGE python AS
 'bytecode', 'source'. Of course, when loading this, it fails:
 'ERROR:  only one AS item needed for language python'.

 So is this fix your broken PL or pg_dump should only be doing that for C
 language functions?

Offhand it seems to me that pg_dump is behaving reasonably: it's storing
probin if it sees something there to be stored.  The asymmetry is in the
backend, specifically functioncmds.c's interpret_AS_clause(): it has a
hardwired assumption that probin is only relevant to C functions.

Feel free to propose a saner definition.  AFAICS the current coding
makes probin useless for all except C functions, so I think it could
be improved.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster