Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Simon Riggs

On Tue, 2009-04-14 at 18:29 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Tue, 2009-04-14 at 16:37 -0400, Tom Lane wrote:
> >> Comments, objections, better ideas?
> 
> > Please, if you do this, make it optional.
> 
> I don't think making the plpgsql lexer pluggable is realistic.

Doesn't sound easy, no. (I didn't suggest pluggable, just optional).

> > Potentially changing the behaviour of thousands of functions just to fix
> > a rare bug will not endear us to our users. The bug may be something
> > that people are relying on in some subtle way, ugly as that sounds.
> 
> That's why I don't want to change it in a minor release.  In a major
> release, however, it's fair game.

If we want to make easy upgrades a reality, this is the type of issue we
must consider. Not much point having perfect binary upgrades if all your
functions start behaving differently after upgrade and then you discover
there isn't a binary downgrade path...

Rather than come up with specific solutions, let me just ask the
question: Is there a workaround for people caught by these changes?
Let's plan that alongside the change itself, so we have a reserve
'chute.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why isn't stats_temp_directory automatically created?

2009-04-14 Thread Fujii Masao
Hi,

On Tue, Apr 14, 2009 at 10:26 PM, Euler Taveira de Oliveira
 wrote:
> Fujii Masao escreveu:
>>
>> Is it worth making the patch which creates stats_temp_directory
>> if not present?
>>
> +1.

Here is the patch.

This patch should be added to CommitFest-2009-First?,
or committed before 8.4 release? The patch is very small,
so I don't think that it'll block 8.4 release.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Index: src/backend/postmaster/pgstat.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.187
diff -c -r1.187 pgstat.c
*** src/backend/postmaster/pgstat.c	1 Jan 2009 17:23:46 -	1.187
--- src/backend/postmaster/pgstat.c	15 Apr 2009 06:08:04 -
***
*** 111,116 
--- 111,117 
  bool		pgstat_track_counts = false;
  int			pgstat_track_functions = TRACK_FUNC_OFF;
  int			pgstat_track_activity_query_size = 1024;
+ char	   *pgstat_temp_directory;
  
  /* --
   * Built from GUC parameter
***
*** 589,594 
--- 590,600 
  		return 0;
  
  	/*
+ 	 * Create temporary statistics directory if not present; ignore errors
+ 	 */
+ 	mkdir(pgstat_temp_directory, 0700);
+ 
+ 	/*
  	 * Do nothing if too soon since last collector start.  This is a safety
  	 * valve to protect against continuous respawn attempts if the collector
  	 * is dying immediately at launch.	Note that since we will be re-called
Index: src/backend/utils/misc/guc.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/misc/guc.c,v
retrieving revision 1.502
diff -c -r1.502 guc.c
*** src/backend/utils/misc/guc.c	7 Apr 2009 23:27:34 -	1.502
--- src/backend/utils/misc/guc.c	15 Apr 2009 06:08:18 -
***
*** 375,382 
  char	   *IdentFileName;
  char	   *external_pid_file;
  
- char	   *pgstat_temp_directory;
- 
  int			tcp_keepalives_idle;
  int			tcp_keepalives_interval;
  int			tcp_keepalives_count;
--- 375,380 
Index: src/include/pgstat.h
===
RCS file: /projects/cvsroot/pgsql/src/include/pgstat.h,v
retrieving revision 1.82
diff -c -r1.82 pgstat.h
*** src/include/pgstat.h	4 Jan 2009 22:19:59 -	1.82
--- src/include/pgstat.h	15 Apr 2009 06:08:22 -
***
*** 593,598 
--- 593,599 
  extern bool pgstat_track_counts;
  extern int	pgstat_track_functions;
  extern PGDLLIMPORT int	pgstat_track_activity_query_size;
+ extern PGDLLIMPORT char *pgstat_temp_directory;
  extern char *pgstat_stat_tmpname;
  extern char *pgstat_stat_filename;
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why isn't stats_temp_directory automatically created?

2009-04-14 Thread Itagaki Takahiro

Euler Taveira de Oliveira  wrote:

> Fujii Masao escreveu:
> > Is it worth making the patch which creates stats_temp_directory
> > if not present?
> > 
> +1.

+1, but AFAIK stats_temp_directory was designed to symlink to a RAM drive.
Even if stats_temp_directory exists as a symbolic link, the destination
directory might be lost in such a situation. If you try to make servers
more robust, you might also need to consider broken symlinks.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Warm Standby restore_command documentation (was: New trigger option of pg_standby)

2009-04-14 Thread Fujii Masao
Hi,

On Wed, Apr 15, 2009 at 3:30 AM, Andreas Pflug
 wrote:
> I've been following the thread with growing lack of understanding why
> this is so hardly discussed, and I went back to the documentation of
> what the restore_command should do (
> http://www.postgresql.org/docs/8.3/static/warm-standby.html )
>
> While the algorithm presented in the pseudocode isn't dealing too good
> with a situation where the trigger is set while the restore_command is
> sleeping (this should be handled better in a real implementation), the
> code says
>
> "Restore all wal files. If no more wal files are present, stop restoring
> if the trigger is set; otherwise wait for a new wal file".
>
> Since pg_standby is meant as implementation of restore_command, it has
> to follow the directive stated above; *anything else is a bug*.
> pg_standby currently does *not* obey this directive, and has that
> documented, but a documented bug still is a bug.
>
> Conclusion: There's no "new trigger option" needed, instead pg_standby
> has to be fixed so it does what the warm standby option of postgres
> needs. The trigger is only to be examined if no more files are
> restorable, and only once.

Yeah, as a result of the discussion on that thread, I'll change
the default behavior instead of adding new trigger option.
But, I'm not going to get rid of the current behavior; it's chosen
if the trigger file containing "fast" exists. On the other hand,
new behavior is chosen when the trigger file containing "smart"
or an empty one exists (default).

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Memory exhaustion during bulk insert

2009-04-14 Thread Xin Wang
Hi all,

I'm doing an experimental project using Postgres as the prototype.
I want to store attribute values of xml type in an internal XML table
which is created for every XML column. One XML node (element,
attribute or text) is stored as a tuple in the XML table. While
a 127MB XML document 'dblp.xml' (that has about 4 million XML nodes
thus 4 million tuples) is being stored, 2GB memory is exhausted rapidly
and then my computer hangs up. I guess the reason is that the memory
runs out before the transaction is committed because the number of
tuples being inserted is too large.

The flow of tuple insertion and functions called are as follows:

while (get the next XML node != NULL)
{
/* fill in values and isnull array */
...
tup = heap_form_tuple(tupleDesc, values, isnull);
simple_heap_insert(xmlTable, tup);
...
heap_freetuple(tup);
}

I searched the mailinglist archive and noticed that a patch to improve
bulk insert performance is committed in Nov 2008. The log message said

"(the patch) keeps the current target buffer pinned and make it work
in a small ring of buffers to avoid having bulk inserts trash the whole
buffer arena."

However, I do not know much about the code below the heapam layer. Can that
patch solve my problem (the version I use is 8.3.5)? Or could you give me
some suggestion about how to avoid memory exhaustion during bulk insert
(in the meanwhile it must clean up nicely after a transaction abort)?

Thanks in advance.
Regards,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch for server-side encoding issues

2009-04-14 Thread Itagaki Takahiro
Here is a WIP patch to solve server-side encoding issues.
It includes "Solution of the file name problem of copy on windows" patch.

http://archives.postgresql.org/message-id/20090413184335.39be.52131...@oss.ntt.co.jp

It could solve the following issues. They are not only in Windows nor
Japan-specific problems. They could also occur if you use databases
with mulitple encodings or database with non-platform-native encoding
even on POSIX platforms.

<1> Non-ascii file paths for database that encoding is different from
platform's encoding (that comes from $LANG or Windows codepage),
especially for COPY TO/FROM.

<2> Use appropriate encoding for non-text server log (console, syslog
and eventlog). The encoding is the same as <1>.

<3> Use appropriate encoding for text server log (stderr and csvlog),
especially database cluster has databases with a variety of encoding.
New GUC parameter 'log_encoding' specifies the encoding in server log.

<4> (incomplete) Avoid encoding conversion error in printing server log
and messages for client. Instead of error, print '?' if there is no
equivalent character in the target encoding.

For <4>, I use PG_TRY and PG_CATCH for now, but it must be a bad manner.
Instead, I'm thinking that convertion procedures will take an optional
argument whether it should raise error or not. However, we need to
modify all of conversion functions to do so.

More research is needed against following situations:
  - NLS messages
  - Module path for LOAD
  - Arguments for system(), including archive_command and restore_command
  - Query texts for other database in pg_stat_activity and pg_stat_statements

Comments welcome. Please notify me if I'm missing something.


Here is a sample code to test the patch.

(client_encoding = sjis / system encoding = sjis)

C:\home\>createdb utfdb --encoding=utf8 --locale=C
C:\home\>createdb eucdb --encoding=eucjp --locale=C

C:\home\>psql utfdb -c "COPY (SELECT 1) TO 'C:/home/日本語ファイル.txt'"
C:\home\>psql utfdb -c "SELECT '日本語' WITH ERROR"
ERROR:  syntax error at or near "WITH ERROR"
LINE 1: SELECT '日本語' WITH ERROR
^

C:\home\>psql eucdb -c "COPY (SELECT 1) TO 'C:/home/日本語ファイル.txt'"
C:\home\>psql eucdb -c "SELECT '日本語' WITH ERROR"
ERROR:  syntax error at or near "WITH ERROR"
LINE 1: SELECT '日本語' WITH ERROR
^


Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


server-side_encoding_issues_20090415.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> Well, this bug has existed long before 8.4 so we could just leave it for
> 8.5, and it is not like we have had tons of complaints;  the only
> complaint I saw was one from March, 2008.

I think it's a good thing to do in general.  I'm also concerned about
if it will impact the plpgsql functions we have (which are pretty
numerous..) but in the end I'd rather have it fixed in 8.4 than possibly
delayed indefinitely (after all, if it's in 8.4, why fix it for 8.5?).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Tom Lane
Bruce Momjian  writes:
> Well, this bug has existed long before 8.4 so we could just leave it for
> 8.5, and it is not like we have had tons of complaints;  the only
> complaint I saw was one from March, 2008.

We had one last week, which is what prompted me to start looking at the
plpgsql lexer situation in the first place.  Also, if the unicode
literal situation doesn't change, that's going to be problematic as
well.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Bruce Momjian
Tom Lane wrote:
> Simon Riggs  writes:
> > On Tue, 2009-04-14 at 16:37 -0400, Tom Lane wrote:
> >> Comments, objections, better ideas?
> 
> > Please, if you do this, make it optional.
> 
> I don't think making the plpgsql lexer pluggable is realistic.
> 
> > Potentially changing the behaviour of thousands of functions just to fix
> > a rare bug will not endear us to our users. The bug may be something
> > that people are relying on in some subtle way, ugly as that sounds.
> 
> That's why I don't want to change it in a minor release.  In a major
> release, however, it's fair game.

Well, this bug has existed long before 8.4 so we could just leave it for
8.5, and it is not like we have had tons of complaints;  the only
complaint I saw was one from March, 2008.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered to pg_proc and pg_class

2009-04-14 Thread Tom Lane
Bruce Momjian  writes:
> How do you handle dump/restore?  Is it preserved?

I would think not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Andrew Gierth
> "Peter" == Peter Eisentraut  writes:

 > On Tuesday 14 April 2009 07:07:27 Andrew Gierth wrote:
 >> FWIW, the SQL spec puts the onus of normalization squarely on the
 >> application; the database is allowed to assume that Unicode
 >> strings are already normalized, is allowed to behave in
 >> implementation-defined ways when presented with strings that
 >> aren't normalized, and provision of normalization functions and
 >> predicates is just another optional feature.

 Peter> Can you name chapter and verse on that?

4.2.8 Universal character sets

  A UCS string is a character string whose character repertoire is UCS
  and whose character encoding form is one of UTF8, UTF16, or
  UTF32. Any two UCS strings are comparable.

  An SQL-implementation may assume that all UCS strings are normalized
  in one of Normalization Form C (NFC), Normalization Form D (NFD),
  Normalization Form KC (NFKC), or Normalization Form KD (NFKD), as
  specified by [Unicode15].  may be used to
  verify the normalization form to which a particular UCS string
  conforms. Applications may also use  to enforce
  a particular . With the exception of 
  and , the result of any operation on an
  unnormalized UCS string is implementation-defined.

  Conversion of UCS strings from one character set to another is
  automatic.

  Detection of a noncharacter in a UCS-string causes an exception
  condition to be raised. The detection of an unassigned code point
  does not.

[Obviously there are things here that we don't conform to anyway (we
don't raise exceptions for noncharacters, for example. We don't claim
conformance to T061.]

 ::=
   
 ::=
  IS [ NOT ] [  ] NORMALIZED

1) Without Feature T061, "UCS support", conforming SQL language shall
   not contain a .

2) Without Feature F394, "Optional normal form specification",
   conforming SQL language shall not contain .

 ::=
  NORMALIZE  
  [   [   ] ] 


 ::=
NFC
  | NFD
  | NFKC
  | NFKD

7) Without Feature T061, "UCS support", conforming SQL language shall
   not contain a .

9) Without Feature F394, "Optional normal form specification",
   conforming SQL language shall not contain .

 Peter> I see this, for example,

 Peter> 6.27 
 [...]
 Peter> So SQL redirects the question of character length the Unicode
 Peter> standard.  I have not been able to find anything there on a
 Peter> quick look, but I'm sure the Unicode standard has some very
 Peter> specific ideas on this.  Note that the matter of normalization
 Peter> is not mentioned here.

I've taken a not-so-quick look at the Unicode standard (though I don't
claim to be any sort of expert on it), and I certainly can't see any
definitive indication what the length is supposed to be; however, the
use of terminology such as "combining character sequence" (meaning a
series of codepoints that combine to make a single glyph) certainly
seems to strongly imply that our interpretation is correct and that
the OP's is not.

Other indications: the units used by length() must be the same as the
units used by position() and substring() (in the spec, when USING
CHARACTERS is specified), and it would not make sense to use a
definition of "character" that did not allow you to look inside a
combining sequence.

I've also failed so far to find any examples of other programming
languages in which a combining character sequence is taken to be a
single character for purposes of length or position specification.

-- 
Andrew (irc:RhodiumToad)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered to pg_proc and pg_class

2009-04-14 Thread Bruce Momjian
Tom Lane wrote:
> Pavel Stehule  writes:
> > this my proposal is very simple. It help to people who have to manage
> > large or complex database system. Important data are date of creating
> > and date of altering tables and stored procedures. These data cannot
> > be modified by user, so implementation doesn't need any new
> > statements.
> 
> ISTM anyone who thinks they need this actually need a full DDL log;
> or at least, if we give them this, they will be back next week asking
> for a full log.  So it'd save a lot of work to tell them to just log
> their DDL to start with.
> 
> Some obvious objections to the simple approach:
> - what if I want to know *who* made the change
> - what if I need to know about the change before last
> - what if I need to know about a DROP
> - what if I need to know about operators, operator classes, schemas, etc
>   etc

How do you handle dump/restore?  Is it preserved?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Yet another regression issue with Fedora-10 + PG 8.4 beta1

2009-04-14 Thread Tom Lane
Devrim =?ISO-8859-1?Q?G=DCND=DCZ?=  writes:
> I built PG 8.4 beta1 on 2 different Fedora-10 boxes. One of them was a
> local box, which runs under QEMU. The other one is Fedora's build
> servers. (I did a scratch build on Fedora build server)

> On the first machine, I got a regression failure:

> http://www.gunduz.org/temp/regression.out
> http://www.gunduz.org/temp/regression.diffs

> I could not repeat this with 2nd one (Fedora build servers)

> Any ideas what might have caused it?

QEMU screwing up the handling of floating-point traps, perhaps?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Tom Lane
Simon Riggs  writes:
> On Tue, 2009-04-14 at 16:37 -0400, Tom Lane wrote:
>> Comments, objections, better ideas?

> Please, if you do this, make it optional.

I don't think making the plpgsql lexer pluggable is realistic.

> Potentially changing the behaviour of thousands of functions just to fix
> a rare bug will not endear us to our users. The bug may be something
> that people are relying on in some subtle way, ugly as that sounds.

That's why I don't want to change it in a minor release.  In a major
release, however, it's fair game.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with "Function Type" in \df

2009-04-14 Thread David Fetter
On Tue, Apr 14, 2009 at 02:52:32PM -0400, Alvaro Herrera wrote:
> Tom Lane wrote:
> 
> > I had a second thought about that: presumably we should make the
> > function type names translatable.  If we do that, it might be better
> > to make the aggregate case be "aggregate" and take the width hit.
> > Otherwise translators are going to be puzzled when they come across
> > "agg" as a translatable phrase.
> 
> I think it's good to have them translatable.  As for using "aggregate"
> instead of "agg" I don't think it's that great an idea.  If you need to
> notify translators that "agg" stands for "aggregate", add a
> /* translator: */ comment.

Here's the next revision :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 10d42ca..5224440 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1043,14 +1043,19 @@ testdb=>
 
 
 Lists available functions, together with their argument and
-return types. If pattern
-is specified, only functions whose names match the pattern are shown.
-If the form \df+ is used, additional information 
about
-each function, including volatility, language, source code and 
description, is shown.
-By default, only user-created objects are shown;  supply a
-pattern or the S modifier to include system
-objects.
+return types and their function type: 'normal', 'agg',
+'trigger', and 'window'.  If pattern is specified, only
+functions whose names match the pattern are shown.  If the
+form \df+ is used, additional information
+about each function, including volatility, language, source
+code and description, is shown.  By default, only user-created
+objects are shown;  supply a pattern or the
+S modifier to include system objects.  To
+include aggregates in the result set, use \dfa, normal
+functions, \dfn, trigger functions, \dft, windowing functions,
+\dfw.  You may freely mix and match the +, S, a, n, t and w
+options.
 
 
 
@@ -1064,7 +1069,6 @@ testdb=>
 
   
 
-
   
 \dF[+] [ pattern ]
 
diff --git a/doc/src/sgml/release.sgml b/doc/src/sgml/release.sgml
index fc56c3d..09ba686 100644
--- a/doc/src/sgml/release.sgml
+++ b/doc/src/sgml/release.sgml
@@ -183,6 +183,15 @@ do it for earlier branch release files.
  
 
 
+
+ 
+  In psql, \df now shows which type of function (normal,
+  aggregate, trigger, or window) it is.  You can also specify
+  mix-and-match options.  To get aggregates and windowing
+  functions, including system ones, for example, invoke \dfwaS+
+ 
+
+

 

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b39466d..1dc3cc3 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -365,8 +365,22 @@ exec_command(const char *cmd,
case 'D':
success = listDomains(pattern, show_system);
break;
-   case 'f':
-   success = describeFunctions(pattern, 
show_verbose, show_system);
+   case 'f':   /* function subsystem */
+   switch (cmd[2])
+   {
+   case '\0':
+   case '+':
+   case 'S':
+   case 'a':
+   case 'n':
+   case 't':
+   case 'w':
+   success =  
describeFunctions(&cmd[2], pattern, show_verbose, show_system);
+   break;
+   default:
+   status = PSQL_CMD_UNKNOWN;
+   break;
+   }
break;
case 'g':
/* no longer distinct from \du */
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 731baf8..ad5dcbe 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -183,15 +183,43 @@ describeTablespaces(const char *pattern, bool verbose)
 
 
 /* \df
- * Takes an optional regexp to select particular functions
+ * Takes an optional regexp to select particular functions.
+ *
+ * As with \d, you can specify the kinds of functions you want:
+ *
+ * a for aggregat

[HACKERS] Yet another regression issue with Fedora-10 + PG 8.4 beta1

2009-04-14 Thread Devrim GÜNDÜZ
Hi,

I built PG 8.4 beta1 on 2 different Fedora-10 boxes. One of them was a
local box, which runs under QEMU. The other one is Fedora's build
servers. (I did a scratch build on Fedora build server)

On the first machine, I got a regression failure:

http://www.gunduz.org/temp/regression.out
http://www.gunduz.org/temp/regression.diffs

I could not repeat this with 2nd one (Fedora build servers)

Any ideas what might have caused it?

Regards,

-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Tom Lane
Andrew Dunstan  writes:
> Robert Haas wrote:
>> All this sounds good.  As for how to handle := and .., I think making
>> them lex the same way in PL/pgsql and core SQL would be a good thing.

> They don't have any significance in core SQL. What would we do with the 
> lexeme?

It would just fail --- the core grammar will have no production that can
accept it.  Right offhand I think the only difference is that instead of

regression=# select a .. 2;
ERROR:  syntax error at or near "."
LINE 1: select a .. 2;
  ^

you'd see 

regression=# select a .. 2;
ERROR:  syntax error at or near ".."
LINE 1: select a .. 2;
 ^

ie it acts like one token not two in the error message.

This solution would become problematic if the core grammar ever had a
meaning for := or .. that required treating them as two tokens (eg,
the grammar allowed this sequence with whitespace between).  I don't
think that's very likely though; and if it did happen we could fix it
with the aforementioned control switch.

> The only thing that makes me nervous about this is that we're very close 
> to Beta. OTOH, this is one area the regression suite should give a 
> fairly good workout to.

Yeah, I'd rather have done it before beta1, but too late.  The other
solution still entails massive changes to the plpgsql lexer, so it
doesn't really look like much lower risk.  AFAICS the practical
alternatives are a reimplementation in beta2, or no fix until 8.5.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Andrew Dunstan



Robert Haas wrote:

All this sounds good.  As for how to handle := and .., I think making
them lex the same way in PL/pgsql and core SQL would be a good thing.


  


They don't have any significance in core SQL. What would we do with the 
lexeme?


ISTR we've used some hacks in the past to split lexemes into pieces, and 
presumably we'd have to do something similar with these.


The only thing that makes me nervous about this is that we're very close 
to Beta. OTOH, this is one area the regression suite should give a 
fairly good workout to.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Simon Riggs

On Tue, 2009-04-14 at 16:37 -0400, Tom Lane wrote:
> Comments, objections, better ideas?

Please, if you do this, make it optional.

Potentially changing the behaviour of thousands of functions just to fix
a rare bug will not endear us to our users. The bug may be something
that people are relying on in some subtle way, ugly as that sounds.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clean shutdown and warm standby

2009-04-14 Thread Guillaume Smet
Hi,

On Wed, Apr 8, 2009 at 9:11 PM, I wrote:
> Following the discussion here
> http://archives.postgresql.org/message-id/49d9e986.8010...@pse-consulting.de
> , I wrote a small patch which rotates the last XLog file on shutdown
> [snip]

Any comment or advice on how I can fix it with a different method if
this one is considered wrong?

Original message and patch here:
http://archives.postgresql.org/message-id/1d4e0c10904081211p2c0f1cdepe620c11d1271c...@mail.gmail.com

Thanks.

-- 
Guillaume

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Robert Haas
On Tue, Apr 14, 2009 at 4:37 PM, Tom Lane  wrote:
> Whichever way the current discussion about Unicode literals turns out,
> it's clear that plpgsql is not up to speed on matching the core lexer's
> behavior --- it's wrong anyway with respect to
> standard_conforming_strings.
>
> I had earlier speculated semi-facetiously about ripping out the plpgsql
> lexer altogether, but the more I think about it the less silly the idea
> looks.  Suppose that we change the core lexer so that the keyword lookup
> table it's supposed to use is passed to scanner_init() rather than being
> hard-wired in.  Then make plpgsql call the core lexer using its own
> keyword table.  Everything else would match core lexical behavior
> automatically.  The special behavior that we do want, such as being
> able to construct a string representing a desired subrange of the input,
> could all be handled in plpgsql-specific wrapper code.
>
> I've just spent a few minutes looking for trouble spots in this theory,
> and so far the only real ugliness I can see is that plpgsql treats
> ":=" and ".." as single tokens whereas the core would parse them as two
> tokens.  We could hack the core lexer to have an additional switch that
> controls that.  Or maybe just make it always return them as single
> tokens --- AFAICS, neither combination is legal in core SQL anyway,
> so this would only result in a small change in the exact syntax error
> you get if you write such a thing in core SQL.
>
> Another trouble spot is the #option syntax, but that could be handled
> by a special-purpose prescan, or just dropped altogether; it's not like
> we've ever used that for anything but debugging.
>
> It looks like this might take about a day's worth of work (IOW two
> or three days real time) to get done.
>
> Normally I'd only consider doing such a thing during development phase,
> but since we're staring at at least one and maybe two bugs that are
> going to be hard to fix in any materially-less-intrusive way, I'm
> thinking about doing it now.  Theoretically this change shouldn't break
> any working code, so letting it hit the streets in 8.4beta2 doesn't seem
> totally unreasonable.
>
> Comments, objections, better ideas?

All this sounds good.  As for how to handle := and .., I think making
them lex the same way in PL/pgsql and core SQL would be a good thing.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Robert Haas
On Tue, Apr 14, 2009 at 2:55 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Well, that's fine, but that's a long way from Peter's statement that
>> "I think the tendency should be to get rid of E'' usage".
>
> Bear in mind that that's Peter's opinion; it's not necessarily shared
> by anyone else.  I was just responding to your assertion of the
> diametrically opposed position that non-E strings are useless (which
> I don't share either).

Useless might be an overstatement, but I certainly have gotten no
benefit out of them.  It seems decidedly odd to me to propose that
users embed literal control characters in their code/SQL scripts in
lieu of using escape sequences.  If that were a normal and reasonable
thing to do then I expect C, C++, Perl, Python, Ruby, Javascript, and
countless other languages wouldn't provide this functionality.  In
reality, most of them do provide it, sometimes as the only option.
And as to your point about Windows pathnames, if you're trying to
avoid escaping lots of backslashes, dollar-quoting is a perfectly
adequate solution.

The real motivation for standard_conforming_strings is the one
embedded in the name.  And I'm fine with that.  But I don't see E''
disappearing from my code any time soon.  I periodically have a need
for things like foo || E'\n' || bar and it screws up the formatting of
the code to write foo || '
' || bar.  Writing it that way, it's also entirely non-obvious whether
there are any spaces or tabs after the opening quote and before the
newline, and if so whether or not they are intentional.  But I don't
mind if someone ELSE likes the new, standard strings - it just isn't
me.  :-)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-14 Thread Sergey Burladyan
Itagaki Takahiro  writes:

> There are some issues:
> * Is it possible to determine the platform encoding?

There is no platform encoding in linux. File name encoding depend on user
locale, so different users can have different encoding of file name.

-- 
Sergey Burladyan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Tom Lane
Whichever way the current discussion about Unicode literals turns out,
it's clear that plpgsql is not up to speed on matching the core lexer's
behavior --- it's wrong anyway with respect to
standard_conforming_strings.

I had earlier speculated semi-facetiously about ripping out the plpgsql
lexer altogether, but the more I think about it the less silly the idea
looks.  Suppose that we change the core lexer so that the keyword lookup
table it's supposed to use is passed to scanner_init() rather than being
hard-wired in.  Then make plpgsql call the core lexer using its own
keyword table.  Everything else would match core lexical behavior
automatically.  The special behavior that we do want, such as being
able to construct a string representing a desired subrange of the input,
could all be handled in plpgsql-specific wrapper code.

I've just spent a few minutes looking for trouble spots in this theory,
and so far the only real ugliness I can see is that plpgsql treats
":=" and ".." as single tokens whereas the core would parse them as two
tokens.  We could hack the core lexer to have an additional switch that
controls that.  Or maybe just make it always return them as single
tokens --- AFAICS, neither combination is legal in core SQL anyway,
so this would only result in a small change in the exact syntax error
you get if you write such a thing in core SQL.

Another trouble spot is the #option syntax, but that could be handled
by a special-purpose prescan, or just dropped altogether; it's not like
we've ever used that for anything but debugging.

It looks like this might take about a day's worth of work (IOW two
or three days real time) to get done.

Normally I'd only consider doing such a thing during development phase,
but since we're staring at at least one and maybe two bugs that are
going to be hard to fix in any materially-less-intrusive way, I'm
thinking about doing it now.  Theoretically this change shouldn't break
any working code, so letting it hit the streets in 8.4beta2 doesn't seem
totally unreasonable.

Comments, objections, better ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Marko Kreen
On 4/14/09, Tom Lane  wrote:
> Peter Eisentraut  writes:
>  > On Tuesday 14 April 2009 18:54:33 Tom Lane wrote:
>  >> The other proposal that seemed
>  >> attractive to me was a decode-like function:
>  >>
>  >> uescape('foo\00e9bar')
>  >> uescape('foo\00e9bar', '\')
>
>  > This was discussed previously, but rejected with the following argument:
>
>  > There are some other disadvantages for making a function call.  You
>  > couldn't use that kind of literal in any other place where the parser
>  > calls for a string constant: role names, tablespace locations,
>  > passwords, copy delimiters, enum values, function body, file names.
>
>
> I'm less than convinced that those are really plausible use-cases for
>  characters that one is unable to type directly.  However, I'll grant the
>  point.  So that narrows us down to considering the \u extension to E''
>  strings as a saner and safer alternative to the spec's syntax.

My vote would go to \u.  The U& may be "sql standard" but it's different
from any established practical standard.


Alternative would be to make U& follow stdstr setting:

stdstr=on -> you get fully standard-conforming syntax:

  U&'\xxx' UESCAPE '\'

stdstr=off -> you need to follow old quoting rules:

  U&'\\xxx' UESCAPE '\\'

This would result in safe, and when stdstr=on, fully standard compliant
syntax.  Only downside would be that in practice - stdstr=off - it would
be unusable.


Third alternative would be to do both of them - \u as a usable method
and safe-U& to mark the checkbox for SQL-standard compliance.
If we do want U&, I would prefer that to U&-only syntax.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Meredith L. Patterson
Tom Lane wrote:
> This is *not* about code within Postgres.

One typically provides libraries for this sort of thing, but your point
is taken; suggestion withdrawn.

--mlp
_
Meredith L. Patterson
Founder and CTO
Osogato, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Meredith L. Patterson
Tom Lane wrote:
> I suspect that it's actually impossible to parse such a thing correctly
> without a full-fledged flex lexer or something of equivalent complexity.
> Certainly it's a couple of orders of magnitude harder than it is for
> either standard-conforming or E'' literals.

Is there a reason not to use a full-fledged flex lexer?

I'd be willing to take a crack at such a thing, but I'm working 80-hour
weeks through the end of June and likely wouldn't be able to put in any
time on it till then. So I definitely couldn't promise anything for 8.4,
but if putting it off till 8.5 works, sign me up.

Cheers,
--mlp
_
Meredith L. Patterson
Founder and CTO
Osogato, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
"Meredith L. Patterson"  writes:
> Tom Lane wrote:
>> I suspect that it's actually impossible to parse such a thing correctly
>> without a full-fledged flex lexer or something of equivalent complexity.

> Is there a reason not to use a full-fledged flex lexer?

The point is that that's a pretty large imposition on client code that
we don't control or maintain, in order to get a feature that could be
gotten in much less dangerous ways that don't impact any code outside
PG.

> I'd be willing to take a crack at such a thing, but I'm working 80-hour
> weeks through the end of June and likely wouldn't be able to put in any
> time on it till then. So I definitely couldn't promise anything for 8.4,
> but if putting it off till 8.5 works, sign me up.

Shall we pass your name on to every package using Postgres, then?  This
is *not* about code within Postgres.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 21:48:12 Tom Lane wrote:
> Peter Eisentraut  writes:
> > I think we can handle that and the cases Tom presents by erroring out
> > when the U& syntax is used with stdstr off.
>
> I think you're missing the point --- this is not about whether the
> syntax is unambiguous (it is already) but about whether a frontend that
> doesn't understand it 100% will be secure against subversion.  I have no
> confidence in the latter assumption.

I think I am getting the point quite well.  Do you have an example how this 
can be subverted?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with "Function Type" in \df

2009-04-14 Thread David Fetter
On Tue, Apr 14, 2009 at 03:04:55PM -0400, Alvaro Herrera wrote:
> David Fetter wrote:
> > On Tue, Apr 14, 2009 at 02:52:32PM -0400, Alvaro Herrera wrote:
> 
> > > I think it's good to have them translatable.  As for using
> > > "aggregate" instead of "agg" I don't think it's that great an
> > > idea.  If you need to notify translators that "agg" stands for
> > > "aggregate", add a /* translator: */ comment.
> > 
> > Where would I add that?  First mention, each time, or...?
> 
> Is there more than one mention of "agg"?

It's in 3 branches in describe.c.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with "Function Type" in \df

2009-04-14 Thread Alvaro Herrera
David Fetter wrote:
> On Tue, Apr 14, 2009 at 02:52:32PM -0400, Alvaro Herrera wrote:

> > I think it's good to have them translatable.  As for using
> > "aggregate" instead of "agg" I don't think it's that great an idea.
> > If you need to notify translators that "agg" stands for "aggregate",
> > add a /* translator: */ comment.
> 
> Where would I add that?  First mention, each time, or...?

Is there more than one mention of "agg"?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with "Function Type" in \df

2009-04-14 Thread David Fetter
On Tue, Apr 14, 2009 at 02:52:32PM -0400, Alvaro Herrera wrote:
> Tom Lane wrote:
> 
> > I had a second thought about that: presumably we should make the
> > function type names translatable.  If we do that, it might be
> > better to make the aggregate case be "aggregate" and take the
> > width hit.  Otherwise translators are going to be puzzled when
> > they come across "agg" as a translatable phrase.
> 
> I think it's good to have them translatable.  As for using
> "aggregate" instead of "agg" I don't think it's that great an idea.
> If you need to notify translators that "agg" stands for "aggregate",
> add a /* translator: */ comment.

Where would I add that?  First mention, each time, or...?

Cheers,
David (reworking patch per suggestions)
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
Robert Haas  writes:
> Well, that's fine, but that's a long way from Peter's statement that
> "I think the tendency should be to get rid of E'' usage".

Bear in mind that that's Peter's opinion; it's not necessarily shared
by anyone else.  I was just responding to your assertion of the
diametrically opposed position that non-E strings are useless (which
I don't share either).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with "Function Type" in \df

2009-04-14 Thread Alvaro Herrera
Tom Lane wrote:

> I had a second thought about that: presumably we should make the
> function type names translatable.  If we do that, it might be better
> to make the aggregate case be "aggregate" and take the width hit.
> Otherwise translators are going to be puzzled when they come across
> "agg" as a translatable phrase.

I think it's good to have them translatable.  As for using "aggregate"
instead of "agg" I don't think it's that great an idea.  If you need to
notify translators that "agg" stands for "aggregate", add a
/* translator: */ comment.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
Peter Eisentraut  writes:
> On Tuesday 14 April 2009 18:54:33 Tom Lane wrote:
>> The other proposal that seemed
>> attractive to me was a decode-like function:
>> 
>> uescape('foo\00e9bar')
>> uescape('foo\00e9bar', '\')

> This was discussed previously, but rejected with the following argument:

> There are some other disadvantages for making a function call.  You 
> couldn't use that kind of literal in any other place where the parser 
> calls for a string constant: role names, tablespace locations, 
> passwords, copy delimiters, enum values, function body, file names.

I'm less than convinced that those are really plausible use-cases for
characters that one is unable to type directly.  However, I'll grant the
point.  So that narrows us down to considering the \u extension to E''
strings as a saner and safer alternative to the spec's syntax.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Robert Haas
On Tue, Apr 14, 2009 at 2:22 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Maybe I've just got my head deeply in the sand, but I don't understand
>> what the alternative to E'' supposedly is.  How am I supposed to write
>> the equivalent of E'\t\n\f' without using E''?  The
>> standard_conforming_strings syntax apparently supports no escapes of
>> any kind, which seems so hideously inconvenient that I can't even
>> imagine why someone wants that behavior.
>
> Well, quite aside from issues of compatibility with standards and other
> databases, I'm sure there are lots of Windows users who are more
> interested in being able to store a Windows pathname without doubling
> their backslashes than they are in being able to type readable names
> for ASCII control characters.  After all, in most cases you can get
> those characters into a string just by typing them (especially if you
> aren't using readline or something like it).

Well, that's fine, but that's a long way from Peter's statement that
"I think the tendency should be to get rid of E'' usage".  It's only
been a minor inconvenience for me to change my applications to use
E'', but I certainly don't see how I could get by without it, and it's
far more like other programming languages that I use (e.g. C) than the
standard syntax is.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
Peter Eisentraut  writes:
> I think we can handle that and the cases Tom presents by erroring out
> when the U& syntax is used with stdstr off.

I think you're missing the point --- this is not about whether the
syntax is unambiguous (it is already) but about whether a frontend that
doesn't understand it 100% will be secure against subversion.  I have no
confidence in the latter assumption.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 17:32:00 Tom Lane wrote:
>  I admit that the SQL:2008 way also covers Unicode code
> points in identifiers, which we can't emulate without a lexical change;
> but frankly I think the use-case for that is so thin as to be almost
> nonexistent.  Who is going to choose identifiers that they cannot easily
> type on their keyboards?

For example, table names are automatically generated, or you write a test case 
for weird looking names, or you want to add special characters in an 
identifier that will later be displayed somewhere, or in general you are 
writing an application for a foreign language.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 18:54:33 Tom Lane wrote:
> The other proposal that seemed
> attractive to me was a decode-like function:
>
> uescape('foo\00e9bar')
> uescape('foo\00e9bar', '\')

This was discussed previously, but rejected with the following argument:

There are some other disadvantages for making a function call.  You 
couldn't use that kind of literal in any other place where the parser 
calls for a string constant: role names, tablespace locations, 
passwords, copy delimiters, enum values, function body, file names.

Some of those appear to be very plausible use cases.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered topg_proc and pg_class

2009-04-14 Thread Robert Haas
On Tue, Apr 14, 2009 at 2:13 PM, Kevin Grittner
 wrote:
> "Kevin Grittner"  wrote:
>> the timestamp column caused the copy to be about 11.3% larger
>
> Grabbed the wrong numbers.  It's really 2.5%, but still

Well, that's why Tom doesn't want to add it to pg_class.  But putting
it in a separate table will have no impact on the speed of anything
except DDL statements, and even then it won't require copying the
whole table, so the performance impact will be pretty minimal, so I
think it should be all right.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 21:22:29 Tom Lane wrote:
> BTW, does anyone know whether Unicode includes the ASCII control
> characters ... ie, is \u0009 a name for tab?  If so, maybe this
> syntax is in part an attempt to cover that use-case in the standard.

Yes on both.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 17:13:00 Marko Kreen wrote:
> If the parsing does not happen in 2 passes and it does not take account
> of stdstr setting then the  default breakage would be:
>
>stdstr=off, U&' \' UESCAPE '!'.

I think we can handle that and the cases Tom presents by erroring out when the 
U& syntax is used with stdstr off.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread David E. Wheeler

On Apr 14, 2009, at 11:10 AM, Tom Lane wrote:


Andrew Dunstan  writes:
I think there's a good case for some functions implementing the  
various

Unicode normalization functions, though.


I have no objection to that so long as the code footprint is in line
with the utility gain (i.e. not all that much).  If we have to bring  
in
ICU or something similar to make it happen, the cost/benefit ratio  
looks

pretty bad.


I've no idea what it would require, but the mapping table must be  
pretty substantial. Still, I'd love to have this functionality in the  
database.


Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread David E. Wheeler

On Apr 14, 2009, at 11:22 AM, Tom Lane wrote:


BTW, does anyone know whether Unicode includes the ASCII control
characters ... ie, is \u0009 a name for tab?  If so, maybe this
syntax is in part an attempt to cover that use-case in the standard.


Yes, you can use, e.g., 	 in HTML to represent a tab character.

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 20:35:21 Robert Haas wrote:
> Maybe I've just got my head deeply in the sand, but I don't understand
> what the alternative to E'' supposedly is.  How am I supposed to write
> the equivalent of E'\t\n\f' without using E''?

Well, the first alternative is to type those characters in literally.  The 
second alternative is the U&'' syntax. ;-)  The third alternative is to design 
applications that don't need this, because the processing behavior of those 
characters is quite unportable.  But of course in some cases using the E'' 
syntax is the most convenient.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Warm Standby restore_command documentation (was: New trigger option of pg_standby)

2009-04-14 Thread Andreas Pflug
I've been following the thread with growing lack of understanding why
this is so hardly discussed, and I went back to the documentation of
what the restore_command should do (
http://www.postgresql.org/docs/8.3/static/warm-standby.html )

While the algorithm presented in the pseudocode isn't dealing too good
with a situation where the trigger is set while the restore_command is
sleeping (this should be handled better in a real implementation), the
code says

"Restore all wal files. If no more wal files are present, stop restoring
if the trigger is set; otherwise wait for a new wal file".

Since pg_standby is meant as implementation of restore_command, it has
to follow the directive stated above; *anything else is a bug*.
pg_standby currently does *not* obey this directive, and has that
documented, but a documented bug still is a bug.

Conclusion: There's no "new trigger option" needed, instead pg_standby
has to be fixed so it does what the warm standby option of postgres
needs. The trigger is only to be examined if no more files are
restorable, and only once.

Regards,
Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread - -
>> I don't believe that the standard forbids the use of combining chars at all.
>> RFC 3629 says:
>>
>>  ... This issue is amenable to solutions based on Unicode Normalization
>>  Forms, see [UAX15].

> This is the relevant part. Tom was claiming that the UTF8 encoding required
> normalizing the string of unicode codepoints before encoding. I'm not sure
> that's true though, is it?

No. I think Tom has mistaken this for the fact that the UTF8 encoding
can have multiple byte representations for one and the same code
point. The standard requires the
shortest byte representation to be used. (Please see
http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/character-encoding.html
for more information). However, this has nothing to do with *code
point* normalization. The encoding does not require a code point
sequence to be normalized. Infact, UTF-8 could hold any of the 4
different normalized forms, 2 of which are completely decomposed
forms, that is, every accent takes up its own code point. Also, UTF-8
could hold non-normalized strings. Encodings just deal with how code
points are represented in memory or over wires.

> Another question is "what is the purpose of a database"?  To me it would
> be quite the wrong thing for the DB to not store what is presented, as
> long as it's considered legal.  Normalization of legal variant forms
> seems pretty questionable.  So I'm with the camp that says this is the
> application's responsibility.

What I did not mean is automatic normalization. I meant something like
PG providing a function to normalize strings which can be explicitly
called by the user in case it is needed. For example:

SELECT * FROM table1 WHERE normalize(a, 'NFC') = normalize($1, 'NFC');
-- NFC is one of the 4 mentioned normalization forms and the one that
should probably be used, since it combines code points rather than
decomposing them.

I completely agree that the database should never just normalize by
itself, because it might be the users intention to store
non-normalized strings. An exception might be an explicit
configuration setting which tells PG to normalize automatically. In
case of the above SELECT query, the problem of offloading the
normalization to the app means, that every single application that is
ever used with this database has to a) normalize the string, b) use
the same normalization form. If just one application at one point in
time fails to do so, string comparison is no longer safe (which is
could be a security problem as the quoted RFC text says). But with a
callable function like normalize() above, the user himself can choose
whether it is important or not. That is, does he want code points to
match (do not use normalize() then), or does he want characters to
match (use normalize() then). The user can normalize the string
exactly where it is needed (for comparison).

I've searched PG's source code and it appeared to me that the 'text'
type is just a typedef for 'varlena', the same type 'bytea' is based
on. Given that the client and database encoding is the same, does this
mean that text is internally stored in exactly the same binary
representation the client has sent it in? So that if the client has
sent it in any of the 4 normalized forms, PG guarantees to store and
retrieve it (in case of a later SELECT) exactly as it was sent ("store
what is presented")? In other words: does PG guarantuee the code point
sequence to remain the same? Because if it does not, you cannot
offload the normalization work to the app anyway, since PG would be
allowed "un-normalize" it internally.

Also, what happens if the client has a different encoding than the
database, and PG has to internally convert client strings to UTF-8.
Does it only generate code points in the same normalized form that it
expects the user input to be in?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
Robert Haas  writes:
> Maybe I've just got my head deeply in the sand, but I don't understand
> what the alternative to E'' supposedly is.  How am I supposed to write
> the equivalent of E'\t\n\f' without using E''?  The
> standard_conforming_strings syntax apparently supports no escapes of
> any kind, which seems so hideously inconvenient that I can't even
> imagine why someone wants that behavior.

Well, quite aside from issues of compatibility with standards and other
databases, I'm sure there are lots of Windows users who are more
interested in being able to store a Windows pathname without doubling
their backslashes than they are in being able to type readable names
for ASCII control characters.  After all, in most cases you can get
those characters into a string just by typing them (especially if you
aren't using readline or something like it).

BTW, does anyone know whether Unicode includes the ASCII control
characters ... ie, is \u0009 a name for tab?  If so, maybe this
syntax is in part an attempt to cover that use-case in the standard.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 19:26:41 Tom Lane wrote:
> Another question is "what is the purpose of a database"?  To me it would
> be quite the wrong thing for the DB to not store what is presented, as
> long as it's considered legal.  Normalization of legal variant forms
> seems pretty questionable.  So I'm with the camp that says this is the
> application's responsibility.

I think automatically normalizing or otherwise fiddling with Unicode strings 
with combining characters is not acceptable.  But the point is that we should 
process equivalent forms in a consistent way.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Peter Eisentraut
On Monday 13 April 2009 20:18:31 - - wrote:
> 2) PG has no support for the Unicode collation algorithm. Collation is
> offloaded to the OS, which makes this quite inflexible.

This argument is unclear.  Do you want the Unicode collation algorithm or do 
you want flexibility?  Some OS do implement the Unicode collation algorithm.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 18:49:45 Greg Stark wrote:
> What's really at issue is "what is a string?". That is, it a sequence
> of characters or a sequence of code points.

I think a sequence of codepoints would be about as silly a definition as the 
antiquated notion of a string as a sequence of bytes.

> If it's the former then we
> would also have to prohibit certain strings such as U&'\0301'
> entirely. And we have to make substr() pick out the right number of
> code points, etc.

Sure enough.  That all goes along with what the original poster was saying.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Kevin Grittner
Greg Stark  wrote: 
> Peter Eisentraut  wrote:
>> SELECT U&'\00E9', char_length(U&'\00E9');
>>  ?column? | char_length
>> --+-
>>  é|   1
>> (1 row)
>>
>> SELECT U&'\0065\0301', char_length(U&'\0065\0301');
>>  ?column? | char_length
>> --+-
>>  é|   2
>> (1 row)
> 
> What's really at issue is "what is a string?". That is, it a
> sequence of characters or a sequence of code points.
 
Doesn't the SQL standard refer to them as "character string literals"?
 
The function is called character_length or char_length.
 
I'm curious -- can every multi-code-point character be normalized to a
single-code-point character?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered topg_proc and pg_class

2009-04-14 Thread Kevin Grittner
"Kevin Grittner"  wrote: 
> the timestamp column caused the copy to be about 11.3% larger
 
Grabbed the wrong numbers.  It's really 2.5%, but still
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Tom Lane
Andrew Dunstan  writes:
> I think there's a good case for some functions implementing the various 
> Unicode normalization functions, though.

I have no objection to that so long as the code footprint is in line
with the utility gain (i.e. not all that much).  If we have to bring in
ICU or something similar to make it happen, the cost/benefit ratio looks
pretty bad.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered to pg_proc and pg_class

2009-04-14 Thread Kevin Grittner
Robert Haas  wrote: 
> Making pg_class and pg_proc tables larger hurts run-time
performance,
> potentially.  Making a separate table only slows down DDL
operations,
> which are much less frequent.
 
Copying the pg_class table, with oids and indexes, with and without
the addition of one timestamp column, the timestamp column caused the
copy to be about 11.3% larger; so I see your point.
 
I guess I didn't realize just how tight the pg_class table was.
 
Given all that, I'm going to say that from my perspective I don't
think the convenience of saving the information is worth the cost,
with either approach.  I understand it might mean more to others.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression failure on RHEL 4 w/ PostgreSQL 8.4 beta1

2009-04-14 Thread Devrim GÜNDÜZ
On Tue, 2009-04-14 at 11:16 -0400, Tom Lane wrote:
> > So what changed between 8.3 and 8.4? Same box can build 8.3 with
> > --with-system-tzdata . 
> 
> We didn't have 64-bit tzdata support before --- it's a new test
> covering new functionality.

Thanks Tom. 

Regards,
-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Unicode support

2009-04-14 Thread Andrew Dunstan



Kevin Grittner wrote:
 
I'm curious -- can every multi-code-point character be normalized to a

single-code-point character?
 
  


I don't believe so. Those combinations used in the most common 
orthographic languages have their own code points, but I understand you 
can use the combining chars with essentially any other chars, although 
it might not always make much sense to do so. That's important when 
you're inventing symbols in things like Mathematical and Scientific papers.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Andrew Dunstan



David E. Wheeler wrote:

On Apr 14, 2009, at 9:26 AM, Tom Lane wrote:


Another question is "what is the purpose of a database"?  To me it would
be quite the wrong thing for the DB to not store what is presented, as
long as it's considered legal.  Normalization of legal variant forms
seems pretty questionable.  So I'm with the camp that says this is the
application's responsibility.


Can `convert()` normalize strings?




I think that's handling a quite different problem.

It certainly should not do so automatically, IMNSHO.

I think there's a good case for some functions implementing the various 
Unicode normalization functions, though.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Robert Haas
On Tue, Apr 14, 2009 at 8:53 AM, Peter Eisentraut  wrote:
> This doesn't excite me.  I think the tendency should be to get rid of E''
> usage, because its definition of escape sequences is single-byte and ASCII
> centric and thus overall a legacy construct.  Certainly, we will want to keep
> around E'' for a long time or forever, but it is a legitimate goal for
> application writers to not use it, which is after all the reason behind this
> whole standards-conforming strings project.  I wouldn't want to have a
> forward-looking feature such as the Unicode escapes be burdened with that kind
> of legacy behavior.
>
> Also note that Unicode escapes are also available for identifiers, for which
> there is no existing E"" that you can add it to.

Maybe I've just got my head deeply in the sand, but I don't understand
what the alternative to E'' supposedly is.  How am I supposed to write
the equivalent of E'\t\n\f' without using E''?  The
standard_conforming_strings syntax apparently supports no escapes of
any kind, which seems so hideously inconvenient that I can't even
imagine why someone wants that behavior.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread David E. Wheeler

On Apr 14, 2009, at 9:26 AM, Tom Lane wrote:

Another question is "what is the purpose of a database"?  To me it  
would

be quite the wrong thing for the DB to not store what is presented, as
long as it's considered legal.  Normalization of legal variant forms
seems pretty questionable.  So I'm with the camp that says this is the
application's responsibility.


Can `convert()` normalize strings?

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with "Function Type" in \df

2009-04-14 Thread David Fetter
On Tue, Apr 14, 2009 at 12:35:21PM -0400, Tom Lane wrote:
> David Fetter  writes:
> > On Mon, Apr 13, 2009 at 07:24:31PM -0400, Tom Lane wrote:
> >> I'd go for something like
> >> 
> >> Type
> >> 
> >> window
> >> agg
> >> trigger
> >> normal
> >> 
> >> Or we could spell out "aggregate", but that makes the column a
> >> couple of characters wider ...
> 
> > Done.
> 
> I had a second thought about that: presumably we should make the
> function type names translatable.  If we do that, it might be better
> to make the aggregate case be "aggregate" and take the width hit.
> Otherwise translators are going to be puzzled when they come across
> "agg" as a translatable phrase.

OK

> Or maybe I'm overthinking that problem.  Comments from anyone who
> actually does translations?
> 
> > I've also added \df[antw], which lets people narrow their search.
> 
> Why didn't you make it work like \d[tisv], ie allow more than one
> letter?  If you're going to be inventing new features at this late
> date, they should at least work like the adjacent precedent.

Because I didn't think of it.  Lemme see about that :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Tom Lane
Greg Stark  writes:
> What's really at issue is "what is a string?". That is, it a sequence
> of characters or a sequence of code points. If it's the former then we
> would also have to prohibit certain strings such as U&'\0301'
> entirely. And we have to make substr() pick out the right number of
> code points, etc.

Another question is "what is the purpose of a database"?  To me it would
be quite the wrong thing for the DB to not store what is presented, as
long as it's considered legal.  Normalization of legal variant forms
seems pretty questionable.  So I'm with the camp that says this is the
application's responsibility.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with "Function Type" in \df

2009-04-14 Thread Tom Lane
David Fetter  writes:
> On Mon, Apr 13, 2009 at 07:24:31PM -0400, Tom Lane wrote:
>> I'd go for something like
>> 
>> Type
>> 
>> window
>> agg
>> trigger
>> normal
>> 
>> Or we could spell out "aggregate", but that makes the column a
>> couple of characters wider ...

> Done.

I had a second thought about that: presumably we should make the
function type names translatable.  If we do that, it might be better
to make the aggregate case be "aggregate" and take the width hit.
Otherwise translators are going to be puzzled when they come across
"agg" as a translatable phrase.

Or maybe I'm overthinking that problem.  Comments from anyone who
actually does translations?

> I've also added \df[antw], which lets people narrow their search.

Why didn't you make it work like \d[tisv], ie allow more than one
letter?  If you're going to be inventing new features at this late
date, they should at least work like the adjacent precedent.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered to pg_proc and pg_class

2009-04-14 Thread Tom Lane
Robert Haas  writes:
> On Tue, Apr 14, 2009 at 10:27 AM, Kevin Grittner
>> Yeah, if it would be too heavy to add a timestamp column or two to
>> pg_class and maybe one or two others, why is it better to add a whole
>> new table to maintain in parallel -- with it's own primary key,
>> foreign keys (or similar integrity enforcement mechanism), etc.

> Making pg_class and pg_proc tables larger hurts run-time performance,
> potentially.  Making a separate table only slows down DDL operations,
> which are much less frequent.

And even more to the point, adding columns to the core system tables
means you pay the performance cost *even when not using the feature*.
We normally expect that inessential features should avoid making a
performance impact on those who have no use for them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Saturday 11 April 2009 18:20:47 Sam Mason wrote:
> I can't see much support in the other database engines; searched for
> Oracle, MS-SQL, DB2 and Firebird.  MySQL has it planned for 7.1, so not
> for a while.

DB2 supports it, as far as I know.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 14:38:38 Marko Kreen wrote:
> I think the problem is that they should not act like E'' strings, but they
> should act like plain '' strings - they should follow stdstr setting.
>
> That way existing tools that may (or may not..) understand E'' and stdstr
> settings, but definitely have not heard about U&'' strings can still
> parse the SQL without new surprises.

Can you be more specific in what "surprises" you expect?  What algorithms do 
you suppose those "existing tools" use and what expectations do they have?

> I still stand on my proposal, how about extending E'' strings with
> unicode escapes (eg. \u)?  The E'' strings are already more
> clearly defined than '' and they are our "own", we don't need to
> consider random standards, but can consider our sanity.

This doesn't excite me.  I think the tendency should be to get rid of E'' 
usage, because its definition of escape sequences is single-byte and ASCII 
centric and thus overall a legacy construct.  Certainly, we will want to keep 
around E'' for a long time or forever, but it is a legitimate goal for 
application writers to not use it, which is after all the reason behind this 
whole standards-conforming strings project.  I wouldn't want to have a 
forward-looking feature such as the Unicode escapes be burdened with that kind 
of legacy behavior.

Also note that Unicode escapes are also available for identifiers, for which 
there is no existing E"" that you can add it to.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
Marko Kreen  writes:
> I would prefer that such quoting extensions would wait until
> stdstr=on setting is the only mode Postgres will operate.
> Fitting new quoting ways to environment with flippable stdstr setting
> will be rather painful for everyone.

It would certainly be a lot safer to wait until non-standard-conforming
strings don't exist anymore.  The problem is that that may never happen,
and is certainly not on the roadmap to happen in the foreseeable future.

> I still stand on my proposal, how about extending E'' strings with
> unicode escapes (eg. \u)?  The E'' strings are already more
> clearly defined than '' and they are our "own", we don't need to
> consider random standards, but can consider our sanity.

That's one way we could proceed.  The other proposal that seemed
attractive to me was a decode-like function:

uescape('foo\00e9bar')
uescape('foo\00e9bar', '\')

(double all the backslashes if you assume not
standard_conforming_strings).  The arguments in favor of this one
are (1) you can apply it to the result of an expression, it's not
strictly tied to literals; and (2) it's a lot lower-footprint solution
since it doesn't affect basic literal handling.  If you wish to suppose
that this is only a stopgap until someday when we can implement the SQL
standard syntax more safely, then low footprint is good.  One could
even imagine back-porting this into existing releases as a user-defined
function.

The solution with \u in extended literals is probably workable too.
I'm slightly worried about the possibility of issues with code that
thinks it knows what an E-literal means but doesn't really.  In
particular something might think it knows that "\u" just means "u",
and proceed to strip the backslash.  I don't see a path for that to
become a security hole though, only a garden-variety bug.  So I could
live with that one on the grounds of being easier to use (which it
would be, because of less typing compared to uescape()).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Greg Stark
On Tue, Apr 14, 2009 at 1:32 PM, Peter Eisentraut  wrote:
> On Monday 13 April 2009 22:39:58 Andrew Dunstan wrote:
>> Umm, but isn't that because your encoding is using one code point?
>>
>> See the OP's explanation w.r.t. canonical equivalence.
>>
>> This isn't about the number of bytes, but about whether or not we should
>> count characters encoded as two or more combined code points as a single
>> char or not.
>
> Here is a test case that shows the problem (if your terminal can display
> combining characters (xterm appears to work)):
>
> SELECT U&'\00E9', char_length(U&'\00E9');
>  ?column? | char_length
> --+-
>  é        |           1
> (1 row)
>
> SELECT U&'\0065\0301', char_length(U&'\0065\0301');
>  ?column? | char_length
> --+-
>  é        |           2
> (1 row)

What's really at issue is "what is a string?". That is, it a sequence
of characters or a sequence of code points. If it's the former then we
would also have to prohibit certain strings such as U&'\0301'
entirely. And we have to make substr() pick out the right number of
code points, etc.



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 07:07:27 Andrew Gierth wrote:
> FWIW, the SQL spec puts the onus of normalization squarely on the
> application; the database is allowed to assume that Unicode strings
> are already normalized, is allowed to behave in implementation-defined
> ways when presented with strings that aren't normalized, and provision
> of normalization functions and predicates is just another optional
> feature.

Can you name chapter and verse on that?

I see this, for example,

6.27 

5) If a  is specified, then
Case:
a) If the character encoding form of  is not UTF8, 
UTF16, or UTF32, then let S be the .
Case:
i)
If the most specific type of S is character string, then the result is the 
number of characters in the value of S.
NOTE 134 — The number of characters in a character string is determined 
according to the semantics of the character set of that character string.
ii)
Otherwise, the result is OCTET_LENGTH(S).
b) Otherwise, the result is the number of explicit or implicit  in , counted in accordance with the definition 
of those units in the relevant normatively referenced document.

So SQL redirects the question of character length the Unicode standard.  I 
have not been able to find anything there on a quick look, but I'm sure the 
Unicode standard has some very specific ideas on this.  Note that the matter 
of normalization is not mentioned here.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Peter Eisentraut
On Monday 13 April 2009 22:39:58 Andrew Dunstan wrote:
> Umm, but isn't that because your encoding is using one code point?
>
> See the OP's explanation w.r.t. canonical equivalence.
>
> This isn't about the number of bytes, but about whether or not we should
> count characters encoded as two or more combined code points as a single
> char or not.

Here is a test case that shows the problem (if your terminal can display 
combining characters (xterm appears to work)):

SELECT U&'\00E9', char_length(U&'\00E9');
 ?column? | char_length
--+-
 é|   1
(1 row)

SELECT U&'\0065\0301', char_length(U&'\0065\0301');
 ?column? | char_length 
--+-
 é|   2
(1 row)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Andrew Dunstan



Marko Kreen wrote:

I still stand on my proposal, how about extending E'' strings with
unicode escapes (eg. \u)?  The E'' strings are already more
clearly defined than '' and they are our "own", we don't need to
consider random standards, but can consider our sanity.

  


I suspect there would be lots more support in the user community, where 
\u is well understood in a number of contexts (Java and ECMAScript, 
for example). It's also tolerably sane.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression failure on RHEL 4 w/ PostgreSQL 8.4 beta1

2009-04-14 Thread Tom Lane
Devrim =?ISO-8859-1?Q?G=DCND=DCZ?=  writes:
> On Mon, 2009-04-13 at 15:31 -0400, Tom Lane wrote:
>> This test is checking whether you have working 64-bit-tzdata support.
>> It seems you don't.

> So what changed between 8.3 and 8.4? Same box can build 8.3 with
> --with-system-tzdata . 

We didn't have 64-bit tzdata support before --- it's a new test covering
new functionality.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression failure on RHEL 4 w/ PostgreSQL 8.4 beta1

2009-04-14 Thread Devrim GÜNDÜZ
On Mon, 2009-04-13 at 15:31 -0400, Tom Lane wrote:
> > I'm getting the following failure on RHEL 4:
> 
> > http://www.gunduz.org/temp/regression.out
> > http://www.gunduz.org/temp/regression.diffs
> 
> This test is checking whether you have working 64-bit-tzdata support.
> It seems you don't.
> 
> If you built with --with-system-tzdata, and RHEL4 doesn't include
> 64-bit tzdata files, then this failure would be expected.  (I'm not
> totally sure about the second premise, but some quick digging in
> the specfile's changelog suggests that Red Hat only started to
> support 64-bit tzdata in RHEL5.)

So what changed between 8.3 and 8.4? Same box can build 8.3 with
--with-system-tzdata . 

Regards,
-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] proposal: add columns created and altered to pg_proc and pg_class

2009-04-14 Thread Robert Haas
On Tue, Apr 14, 2009 at 10:27 AM, Kevin Grittner
 wrote:
> Pavel Stehule  wrote:
>> I though about it too. But I am not sure, if this isn't too
>> complicated solution for simple task. If I thing little bit more -
>> main important is timestamp of last change.
>
> Yeah, if it would be too heavy to add a timestamp column or two to
> pg_class and maybe one or two others, why is it better to add a whole
> new table to maintain in parallel -- with it's own primary key,
> foreign keys (or similar integrity enforcement mechanism), etc.

Making pg_class and pg_proc tables larger hurts run-time performance,
potentially.  Making a separate table only slows down DDL operations,
which are much less frequent.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
Peter Eisentraut  writes:
> On Saturday 11 April 2009 00:54:25 Tom Lane wrote:
>> If we let this go into 8.4, our previous rounds with security holes
>> caused by careless string parsing will look like a day at the beach.

> Note that the escape character marks the Unicode escapes; it doesn't
> affect the quote characters that delimit the string.  So offhand I
> can't see any potential for quote confusion/SQL injection type
> problems.  Please elaborate if you see a problem.

The problem is the interaction with non-standard-conforming strings.

Case 1:

select u&'foo\' uescape ',' ...

The backend will see the backslash as just a data character, and
will think that "..." is live SQL text.  A non-Unicode-literal-aware
frontend will think that the backslash escapes the second quote, the
comma is live SQL text, and the ... is quoted material.  Construction
of an actual SQL injection attack is left as an exercise for the reader,
but certainly the raw material is here.

Case 2:

select u&'foo' uescape '\' ...

Again, any existing frontend code will think that the backslash quotes
the final quote and the ... is quoted material.  This one is
particularly nasty because we allow arbitrary amounts of whitespace and
commenting on either side of "uescape":


select u&'foo' 
/* hello joe, do you /* understand nested comments today?
*/
-- yes, this one too */
uescape
-- but not this one /*
'\' ...

I suspect that it's actually impossible to parse such a thing correctly
without a full-fledged flex lexer or something of equivalent complexity.
Certainly it's a couple of orders of magnitude harder than it is for
either standard-conforming or E'' literals.

Case 3:

select u&'foo\' uescape ',' ...

select u & 'foo\' uescape ',' ...

In the first form the ... is live SQL, in the second form it is quoted
material.  This means that you might correctly validate a query and then
have your results invalidated by later processing that innocently adds
or removes whitespace.  (This is particularly nasty in a standard that
demands we parse "x/-1" and "x / -1" the same ...)

So what we've got here is a whole new set of potential SQL injection
attacks by confusing frontend literal-syntax checking, plus a truly
staggering increase in the minimum *required* complexity of such
checking.  I understand the usefulness of being able to write Unicode
code points, but they're not useful enough to justify this syntax.

This thread has already mentioned a couple of ways we could add the
same facility without making any lexical-level changes, at least for
data values.  I admit that the SQL:2008 way also covers Unicode code
points in identifiers, which we can't emulate without a lexical change;
but frankly I think the use-case for that is so thin as to be almost
nonexistent.  Who is going to choose identifiers that they cannot easily
type on their keyboards?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered to pg_proc and pg_class

2009-04-14 Thread Kevin Grittner
Pavel Stehule  wrote: 
> I though about it too. But I am not sure, if this isn't too
> complicated solution for simple task. If I thing little bit more -
> main important is timestamp of last change.
 
Yeah, if it would be too heavy to add a timestamp column or two to
pg_class and maybe one or two others, why is it better to add a whole
new table to maintain in parallel -- with it's own primary key,
foreign keys (or similar integrity enforcement mechanism), etc. 
Others apparently see a bigger advantage to this than I, but if it's
not something I can just eyeball while I'm looking at the object
definition, it isn't likely to save me much over going to other
sources.
 
Let's not over-engineer this.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Marko Kreen
On 4/14/09, Peter Eisentraut  wrote:
> On Tuesday 14 April 2009 14:38:38 Marko Kreen wrote:
>  > I think the problem is that they should not act like E'' strings, but they
>  > should act like plain '' strings - they should follow stdstr setting.
>  >
>  > That way existing tools that may (or may not..) understand E'' and stdstr
>  > settings, but definitely have not heard about U&'' strings can still
>  > parse the SQL without new surprises.
>
>
> Can you be more specific in what "surprises" you expect?  What algorithms do
>  you suppose those "existing tools" use and what expectations do they have?

If the parsing does not happen in 2 passes and it does not take account
of stdstr setting then the  default breakage would be:

   stdstr=off, U&' \' UESCAPE '!'.

And anything, whose security or functionality depends on parsing SQL
can be broken that way.

Broken functionality would be eg. Slony (or other replication solution)
distributing developer-written SQL code to bunch of nodes.  It needs to
parse text file to SQL statements and execute them separately.

There are probably other solutions who expect to understand SQL
at least token level to function correctly.  (pgpool, java has
probably something depending on it, etc.)

>  > I still stand on my proposal, how about extending E'' strings with
>  > unicode escapes (eg. \u)?  The E'' strings are already more
>  > clearly defined than '' and they are our "own", we don't need to
>  > consider random standards, but can consider our sanity.
>
>
> This doesn't excite me.  I think the tendency should be to get rid of E''
>  usage, because its definition of escape sequences is single-byte and ASCII
>  centric and thus overall a legacy construct.

Why are you concentrating only on \0xx escapes?  The \\, \n, etc
seem standard and forward looking enough.  Yes, unicode escapes are
missing but we can add them without breaking anything.

>  Certainly, we will want to keep
>  around E'' for a long time or forever, but it is a legitimate goal for
>  application writers to not use it, which is after all the reason behind this
>  whole standards-conforming strings project.  I wouldn't want to have a
>  forward-looking feature such as the Unicode escapes be burdened with that 
> kind
>  of legacy behavior.
>
>  Also note that Unicode escapes are also available for identifiers, for which
>  there is no existing E"" that you can add it to.

Well, I was not rejecting the standard quoting, but suggesting
postponing until the stdstr mess is sorted out.  We can use \uXX
in meantime and I think most Postgres users would prefer to keep
using it...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windowing functions vs aggregates

2009-04-14 Thread Tom Lane
Greg Stark  writes:
> However, I'm kind of confused by that result. Why does the range
> "between unbounded preceding and current row" seem to be doing the
> average of the whole result set?

That's what it's supposed to do. "Current row" really includes all peers
of the current row in the window frame ordering, and since you didn't
specify any ORDER BY clause, all the rows are peers.  If you put in
"order by s" you'll get the result you were expecting:

regression=# select s,(avg(s) OVER (range between unbounded preceding
and current row)) from foo;
 s |avg 
---+
 1 | 2.5000
 2 | 2.5000
 3 | 2.5000
 4 | 2.5000
(4 rows)

regression=# select s,(avg(s) OVER (order by s range between unbounded preceding
   
and current row)) from foo;
 s |  avg   
---+
 1 | 1.
 2 | 1.5000
 3 | 2.
 4 | 2.5000
(4 rows)


I suppose the SQL committee defined it like that to try to reduce the
implementation dependency of the results.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windowing functions vs aggregates

2009-04-14 Thread Tom Lane
Teodor Sigaev  writes:
> Cast of aggregate's type works:
> # select avg(s)::int4 from foo;

> but that doesn't work for with new windowing functions interface:
> # select avg(s)::int4 OVER () from foo;
> ERROR:  syntax error at or near "OVER"
> LINE 1: select avg(s)::int4 OVER () from foo;

> Is that intentional?

You would need to put the cast around the whole "foo() over ()"
construct.  That's not a divisible unit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why isn't stats_temp_directory automatically created?

2009-04-14 Thread Euler Taveira de Oliveira

Fujii Masao escreveu:

Is it worth making the patch which creates stats_temp_directory
if not present?


+1.


--
  Euler Taveira de Oliveira
  http://www.timbira.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windowing functions vs aggregates

2009-04-14 Thread Greg Stark
2009/4/14 Teodor Sigaev :
> select avg(s)::int4 OVER () from foo;

You can put the cast outside the window expression such as:

postgres=# select s,(avg(s) OVER (range between unbounded preceding
and current row))::int4 from foo;
 s | avg
---+-
 1 |   2
 2 |   2
 3 |   2
(3 rows)


However, I'm kind of confused by that result. Why does the range
"between unbounded preceding and current row" seem to be doing the
average of the whole result set? This is not related to the cast:

postgres=# select s,avg(s) OVER (range between unbounded preceding and
current row) from foo;
 s |avg
---+
 1 | 2.
 2 | 2.
 3 | 2.
(3 rows)


I haven't recompiled recently and I do recall some bug fixes a while
back. Was this that? I'm recompiling now.






-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Windowing functions vs aggregates

2009-04-14 Thread Teodor Sigaev

Cast of aggregate's type works:
# select avg(s)::int4 from foo;

but that doesn't work for with new windowing functions interface:
# select avg(s)::int4 OVER () from foo;
ERROR:  syntax error at or near "OVER"
LINE 1: select avg(s)::int4 OVER () from foo;

Is that intentional?

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] libpq is not thread safe

2009-04-14 Thread Zdenek Kotala
When postgreSQL is compiled with --thread-safe that libpq should be
thread safe. But it is not true when somebody call fork(). The problem
is that fork() forks only active threads and some mutex can stay locked
by another thread. We use ssl_config mutex which is global.

We need implement atfork handlers to fix this. See 
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

We should add pthread_atfork into _ini libpq section.

Another problem with fork is that new process inherit connections and so
on. Which is not also good, but it is happened also on single threaded
application and developer can fix it in own code. Maybe some notice in
documentation should help what application should do after fork.

Comments?

Zdenek


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Marko Kreen
On 4/14/09, Peter Eisentraut  wrote:
> On Saturday 11 April 2009 00:54:25 Tom Lane wrote:
>  > It gets worse though: I have seldom seen such a badly designed piece of
>  > syntax as the Unicode string syntax --- see
>  > http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL
>  >-SYNTAX-STRINGS-UESCAPE
>  >
>  > You scan the string, and then after that they tell you what the escape
>  > character is!?  Not to mention the obvious ambiguity with & as an
>  > operator.
>  >
>  > If we let this go into 8.4, our previous rounds with security holes
>  > caused by careless string parsing will look like a day at the beach.
>  > No frontend that isn't fully cognizant of the Unicode string syntax is
>  > going to parse such things correctly --- it's going to be trivial for
>  > a bad guy to confuse a quoting mechanism as to what's an escape and what
>  > isn't.
>
>
> Note that the escape character marks the Unicode escapes; it doesn't affect 
> the
>  quote characters that delimit the string.  So offhand I can't see any 
> potential
>  for quote confusion/SQL injection type problems.  Please elaborate if you see
>  a problem.
>
>  If there are problems, we could consider getting rid of the UESCAPE clause.
>  Without it, the U&'' strings would behave much like the E'' strings.  But I'd
>  like to understand the problem first.

I think the problem is that they should not act like E'' strings, but they
should act like plain '' strings - they should follow stdstr setting.

That way existing tools that may (or may not..) understand E'' and stdstr
settings, but definitely have not heard about U&'' strings can still
parse the SQL without new surprises.

If they already act that way then keeping U& should be fine.

And if UESCAPE does not affect main string parsing, but is handled in
second pass going over parsed string - like bytea \ - then that should
also be fine and should not cause any new surprises.

But if not, it must go.

I would prefer that such quoting extensions would wait until
stdstr=on setting is the only mode Postgres will operate.
Fitting new quoting ways to environment with flippable stdstr setting
will be rather painful for everyone.

I still stand on my proposal, how about extending E'' strings with
unicode escapes (eg. \u)?  The E'' strings are already more
clearly defined than '' and they are our "own", we don't need to
consider random standards, but can consider our sanity.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Saturday 11 April 2009 21:50:29 Josh Berkus wrote:
> On 4/11/09 11:47 AM, Marko Kreen wrote:
> > On 4/11/09, Tom Lane  wrote:
> >>   It gets worse though: I have seldom seen such a badly designed piece
> >> of syntax as the Unicode string syntax --- see
> >>  
> >> http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#
> >>SQL-SYNTAX-STRINGS-UESCAPE
>
> WTF?  Whose feature is this?  What's the use case?

The use case is approximately the same as &#; in HTML: entering Unicode 
characters that your screen or keyboard cannot easily produce.  It's a 
desperately needed feature for me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Saturday 11 April 2009 00:54:25 Tom Lane wrote:
> It gets worse though: I have seldom seen such a badly designed piece of
> syntax as the Unicode string syntax --- see
> http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL
>-SYNTAX-STRINGS-UESCAPE
>
> You scan the string, and then after that they tell you what the escape
> character is!?  Not to mention the obvious ambiguity with & as an
> operator.
>
> If we let this go into 8.4, our previous rounds with security holes
> caused by careless string parsing will look like a day at the beach.
> No frontend that isn't fully cognizant of the Unicode string syntax is
> going to parse such things correctly --- it's going to be trivial for
> a bad guy to confuse a quoting mechanism as to what's an escape and what
> isn't.

Note that the escape character marks the Unicode escapes; it doesn't affect the 
quote characters that delimit the string.  So offhand I can't see any potential 
for quote confusion/SQL injection type problems.  Please elaborate if you see 
a problem.

If there are problems, we could consider getting rid of the UESCAPE clause.  
Without it, the U&'' strings would behave much like the E'' strings.  But I'd 
like to understand the problem first.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New trigger option of pg_standby

2009-04-14 Thread Fujii Masao
Hi,

On Tue, Apr 14, 2009 at 6:35 PM, Simon Riggs  wrote:
> On Mon, 2009-04-13 at 14:52 +0900, Fujii Masao wrote:
>
>> A lookahead (the +1) may have pg_standby get stuck as follows.
>> Am I missing something?
>>
>> 1. the trigger file containing "smart" is created.
>> 2. pg_standby is executed.
>>     2-1. nextWALfile is restored.
>>     2-2. the trigger file is deleted because nextWALfile+1 doesn't exist.
>> 3. the restored nextWALfile is applied.
>> 4. pg_standby is executed again to restore nextWALfile+1.
>
> This can't happen. (4) will never occur when (2-2) has occurred. A
> non-zero error code means file not available which will cause recovery
> to end and hence no requests for further WAL files are made.

When pg_standby exits with non-zero code, (3) and (4) will never
occur, and the transactions in nextWALfile will be lost. So, in (2-2),
pg_standby has to call exit(0), I think.

On the other hand, if exit(0) is called in (2-2), the above scenario
happens.

> It does *seem* as if there is a race condition there in that another WAL
> file may arrive after we have taken the decision there are no more WAL
> files, but it's not a problem. That could happen if we issue the trigger
> while the master is still up, which is a mistake - why would we do that?
> If we only issue the trigger once we are happy the master is down then
> we don't get a problem.

Yeah, I agree that such race condition is not a problem. The
trigger file has to be created after all the WAL files arrive at
the standby server.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New trigger option of pg_standby

2009-04-14 Thread Simon Riggs
On Mon, 2009-04-13 at 14:52 +0900, Fujii Masao wrote:

> A lookahead (the +1) may have pg_standby get stuck as follows.
> Am I missing something?
> 
> 1. the trigger file containing "smart" is created.
> 2. pg_standby is executed.
> 2-1. nextWALfile is restored.
> 2-2. the trigger file is deleted because nextWALfile+1 doesn't exist.
> 3. the restored nextWALfile is applied.
> 4. pg_standby is executed again to restore nextWALfile+1.

This can't happen. (4) will never occur when (2-2) has occurred. A
non-zero error code means file not available which will cause recovery
to end and hence no requests for further WAL files are made.

It does *seem* as if there is a race condition there in that another WAL
file may arrive after we have taken the decision there are no more WAL
files, but it's not a problem. That could happen if we issue the trigger
while the master is still up, which is a mistake - why would we do that?
If we only issue the trigger once we are happy the master is down then
we don't get a problem.

So lets do it the next+1 way, when triggered.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New trigger option of pg_standby

2009-04-14 Thread Fujii Masao
Hi,

On Mon, Apr 13, 2009 at 2:52 PM, Fujii Masao  wrote:
> But, a lookahead nextWALfile seems to work fine.
>
> if (triggered)
> {
>    if (smartMode && nextWALfile exists)
>        exit(0)
>    else
>    {
>        delete trigger file
>        exit(1)
>    }
> }

Umm... in this algorithm, the trigger file remains after failover
if the nextWALfile has the invalid record which means the end
of WAL files.

I'd like to propose another simple idea; pg_standby deletes the
trigger file *whenever* the nextWALfile is a timeline history file.
A timeline history file is restored at the end of recovery, so it's
guaranteed that the trigger file is deleted whether nextWALfile
exists or not.

A timeline history file is restored also at the beginning of
recovery, so the accidentally remaining trigger file is deleted
in early warm-standby as a side-effect of this idea.

How does that sound?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why isn't stats_temp_directory automatically created?

2009-04-14 Thread Fujii Masao
Hi,

log_directory is automatically created if not present when starting
the database server. But, stats_temp_directory is not created. Why?
ISTM that current behavior is undesirable.

Is it worth making the patch which creates stats_temp_directory
if not present?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers