Re: [HACKERS] bytea vs. pg_dump

2009-07-11 Thread Peter Eisentraut
On Wednesday 08 July 2009 01:07:08 Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  Here is a first cut at a new hex bytea input and output format.  Example:
  ...
  SET bytea_output_hex = true;
 
  Should the configuration parameter be a boolean or an enum, opening
  possibilities for other formats?

 Enum.  If we do this then it seems entirely fair that someone might
 want other settings someday.  Also, it seems silly to pick a format
 partly on the grounds that it's expansible, and then not make the
 control GUC expansible.  Perhaps

   SET bytea_output = [ hex | traditional ]

OK, here is an updated patch.  It has the setting as enum, completed 
documentation, and libpq support.  I'll add it to the commit fest in the hope 
that someone else can look it over in detail.

I'm attaching two versions of the patch.  One it made with the -w option, 
which leads to less differences.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2572d78..fece041 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3703,6 +3703,23 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
  titleStatement Behavior/title
  variablelist
 
+ varlistentry id=guc-bytea-output xreflabel=bytea_output
+  termvarnamebytea_output/varname (typeenum/type)/term
+  indexterm
+   primaryvarnamebytea_output/ configuration parameter/primary
+  /indexterm
+  listitem
+   para
+	Sets the output format for values of type typebytea/type.
+	Valid values are literalhex/literal (the default)
+	and literalescape/literal (the traditional PostgreSQL
+	format).  The xref linkend=datatype-binary for more
+	information.  Note that the typebytea/type type always
+	accepts both formats on input.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-search-path xreflabel=search_path
   termvarnamesearch_path/varname (typestring/type)/term
   indexterm
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index c944d8f..bdead3e 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -1,4 +1,4 @@
-!-- $PostgreSQL$ --
+!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.237 2009/04/27 16:27:35 momjian Exp $ --
 
  chapter id=datatype
   title id=datatype-titleData Types/title
@@ -1189,6 +1189,66 @@ SELECT b, char_length(b) FROM test2;
/para
 
para
+The typebytea/type type supports two external formats for
+input and output: the quoteescape/quote format that is
+particular to PostgreSQL, and the quotehex/quote format.  Both
+of these are always accepted on input.  The output format depends
+on the configuration parameter xref linkend=guc-bytea-output;
+the default is hex.  (Note that the hex format was introduced in
+PostgreSQL 8.5; so earlier version and some tools don't understand
+it.)
+   /para
+
+   para
+The acronymSQL/acronym standard defines a different binary
+string type, called typeBLOB/type or typeBINARY LARGE
+OBJECT/type.  The input format is different from
+typebytea/type, but the provided functions and operators are
+mostly the same.
+   /para
+
+  sect2
+   titleThe Hex Format/title
+
+   para
+The hex format encodes the binary data as 2 hexadecimal digits per
+byte, highest significant nibble first.  The entire string ist
+preceded by the sequence literal\x/literal (to distinguish it
+from the bytea format).  In SQL literals, the backslash may need
+to be escaped, but it is one logical backslash as far as the
+typebytea/type type is concerned.  The hex format is compatible with a wide
+range of external applications and protocols, and it tends to be
+faster than the traditional bytea format, so its use is
+somewhat preferrable.
+   /para
+
+   para
+Example:
+programlisting
+SELECT E'\\xDEADBEEF';
+/programlisting
+   /para
+  /sect2
+
+  sect2
+   titleThe Escape Format/title
+
+   para
+The quoteescape/quote format is the traditional
+PostgreSQL-specific format for the typebytea/type type.  It
+takes the approach of representing a binary string as a sequence
+of ASCII characters and escaping those bytes that cannot be
+represented as an ASCII character by a special escape sequence.
+If, from the point of view of the application, representing bytes
+as characters makes sense, then this representation can be
+convenient, but in practice it is usually confusing becauses it
+fuzzes up the distinction between binary strings and characters
+strings, and the particular escape mechanism that was chosen is
+also somewhat unwieldy.  So this format should probably not be
+used for most new applications.
+   /para
+
+   para
 When entering typebytea/type values, octets of certain
 values emphasismust/emphasis be escaped (but all octet
 values emphasiscan/emphasis be escaped) when used as part
@@ -1341,14 +1401,7 

Re: [HACKERS] concurrent index builds unneeded lock?

2009-07-11 Thread Greg Stark
On Sat, Jul 11, 2009 at 6:17 AM, Theo Schlossnagleje...@omniti.com wrote:


 On Jul 11, 2009, at 12:12 AM, Tom Lane wrote:

 Theo Schlossnagle je...@omniti.com writes:

 I would think it would be txns that would be reading that table, but
 I'm thinking it is a bit to aggressive.  Am I reading the code wrong
 there?  I'm thinking it should be more selective about vxids it
 chooses to block on.  I'd expect it to block on vxids touching the
 same table only.

 There is no way to know whether a currently active vxid will try to look
 at the other table later.  We can not just ignore this case...

                        regards, tom lane


 Can't you know that if the other active query in question is a concurrent
 index build?

I think so.

Hm. Actually maybe not. What if the index is an expression index and
the expression includes a function which does an SQL operation? I'm
not sure how realistic that is since to be a danger that SQL operation
would have to be an insert, update, or delete which is not just
bending the rules.

The real problem is that we only added the ability to distinguish
vacuums relatively recently and adding more special cases of
transactions which can be ignored for one purpose or another seems
like a lot of corner cases to worry about.

I wonder whether an earlier more general proposal could have some
leverage here though: some way to indicate that the transaction has
taken all the locks it plans to take already. This was originally
proposed as a way for vacuum to know it can ignore the snapshots of a
transaction since it isn't going to access the table being vacuumed.

In this case the concurrent index build could mark itself as having
taken all the locks it plans to take and other concurrent index builds
could ignore it. They could also ignore any transactions which have
that flag set through any other mechanisms we might add such as a
manual SQL command.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] concurrent index builds unneeded lock?

2009-07-11 Thread Theo Schlossnagle


On Jul 11, 2009, at 6:50 AM, Greg Stark wrote:

On Sat, Jul 11, 2009 at 6:17 AM, Theo Schlossnagleje...@omniti.com  
wrote:



On Jul 11, 2009, at 12:12 AM, Tom Lane wrote:


Theo Schlossnagle je...@omniti.com writes:


I would think it would be txns that would be reading that table,  
but
I'm thinking it is a bit to aggressive.  Am I reading the code  
wrong

there?  I'm thinking it should be more selective about vxids it
chooses to block on.  I'd expect it to block on vxids touching the
same table only.


There is no way to know whether a currently active vxid will try  
to look

at the other table later.  We can not just ignore this case...

   regards, tom lane



Can't you know that if the other active query in question is a  
concurrent

index build?


I think so.

Hm. Actually maybe not. What if the index is an expression index and
the expression includes a function which does an SQL operation? I'm
not sure how realistic that is since to be a danger that SQL operation
would have to be an insert, update, or delete which is not just
bending the rules.

The real problem is that we only added the ability to distinguish
vacuums relatively recently and adding more special cases of
transactions which can be ignored for one purpose or another seems
like a lot of corner cases to worry about.

I wonder whether an earlier more general proposal could have some
leverage here though: some way to indicate that the transaction has
taken all the locks it plans to take already. This was originally
proposed as a way for vacuum to know it can ignore the snapshots of a
transaction since it isn't going to access the table being vacuumed.

In this case the concurrent index build could mark itself as having
taken all the locks it plans to take and other concurrent index builds
could ignore it. They could also ignore any transactions which have
that flag set through any other mechanisms we might add such as a
manual SQL command.



While I see that feature being extremely hard for users to leverage  
via SQL, it seems like a very simplistic and useful internal control.


When acquire locks, we simply check if we've have the future-locking  
bit flipped and abort the transaction if it is.  It seems really  
safe.  I'd imagine the various places we try to use that we'll be  
reminded of the incidental locks we'll be needing to grab.  But, after  
looking at the concurrent index code, it seems that it would solve my  
issue at least.  Also much more elegant that adding more exceptions.


--
Theo Schlossnagle
http://omniti.com/is/theo-schlossnagle
p: +1.443.325.1357 x201   f: +1.410.872.4911






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] *_collapse_limit, geqo_threshold

2009-07-11 Thread Andres Freund
On Wednesday 08 July 2009 23:46:02 Tom Lane wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
  For a moment it seemed logical to suggest a session GUC for the seed,
  so if you got a bad plan you could keep rolling the dice until you got
  one you liked; but my right-brain kept sending shivers down my spine
  to suggest just how uncomfortable it was with that idea

 If memory serves, we actually had exactly that at some point.  But I
 think the reason it got taken out was that it interfered with the
 behavior of the random() function for everything else.  We'd have to
 give GEQO its own private random number generator.
All of GEQOs usage of random() seems to be concentrated to geqo_random.h - so 
it would be a small change.
I will happily tackle that. If only to narrow down in which cases geqo fails 
to plan - a behaviour we have seen at times at a bit more crazy queries.

The only question I have is, whether random_r or similar is available on 
enough platforms... Has anybody an idea about this?
On most unixoid system one could just wrap erand48() if random_r is not 
available.
Windows?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] concurrent index builds unneeded lock?

2009-07-11 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 I wonder whether an earlier more general proposal could have some
 leverage here though: some way to indicate that the transaction has
 taken all the locks it plans to take already. This was originally
 proposed as a way for vacuum to know it can ignore the snapshots of a
 transaction since it isn't going to access the table being vacuumed.

Again, this doesn't work for any interesting cases.  You can't for
example assume that a user-defined datatype won't choose to look into
tables that hadn't been accessed as of the start of the index build.
(This isn't a hypothetical example --- I believe PostGIS does some
such things already, because it keeps spatial reference definitions
in a central catalog table.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] *_collapse_limit, geqo_threshold

2009-07-11 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 The only question I have is, whether random_r or similar is available on 
 enough platforms... Has anybody an idea about this?
 On most unixoid system one could just wrap erand48() if random_r is not 
 available.
 Windows?

random_r() isn't in the Single Unix Spec AFAICS, and I also don't find
it on HPUX 10.20, so I'd vote against depending on it.  What I do see
in SUS is initstate() and setstate() which could be used to control
the random() function:
http://www.opengroup.org/onlinepubs/007908799/xsh/initstate.html
It would also work to leave random() for use by the core code and have
GEQO depend on something from the drand48() family:
http://www.opengroup.org/onlinepubs/007908799/xsh/drand48.html
Probably drand48() is less random than random(), but for the limited
purposes of GEQO I doubt we care very much.

So far as I can find in a quick google search, neither of these families
of functions exist on Windows :-(.  So I think maybe the best approach
is the second one --- we could implement a port/ module that provides a
version of whichever drand48 function we need.

On reflection I think the best user API is probably a geqo_seed GUC in
the range 0 to 1, and have GEQO always reset its seed to that value at
start of a planning cycle.  This ensures plan stability, and if you need
to experiment with alternative plans you can change to different seed
values.  The no reset behavior doesn't seem to have much real-world
usefulness, because even if you chance to get a good plan, you have no
way to reproduce it...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] *_collapse_limit, geqo_threshold

2009-07-11 Thread Andres Freund
Hi,

On Saturday 11 July 2009 18:23:59 Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  The only question I have is, whether random_r or similar is available on
  enough platforms... Has anybody an idea about this?
  On most unixoid system one could just wrap erand48() if random_r is not
  available.
  Windows?

 random_r() isn't in the Single Unix Spec AFAICS, and I also don't find
 it on HPUX 10.20, so I'd vote against depending on it.  What I do see
 in SUS is initstate() and setstate() which could be used to control
 the random() function:
 http://www.opengroup.org/onlinepubs/007908799/xsh/initstate.html
Using setstate() has a bit too many possible implications for my taste - 
especially as there is no way to portably get/save the current random state.

(Running a known query to reset randoms seed or so)

 It would also work to leave random() for use by the core code and have
 GEQO depend on something from the drand48() family:
 http://www.opengroup.org/onlinepubs/007908799/xsh/drand48.html
 Probably drand48() is less random than random(), but for the limited
 purposes of GEQO I doubt we care very much.
Yes.

 So far as I can find in a quick google search, neither of these families
 of functions exist on Windows :-(.  So I think maybe the best approach
 is the second one --- we could implement a port/ module that provides a
 version of whichever drand48 function we need.
Okay.
It would be possible to implement random_r the same way if we are going to 
write something for windows anyway - Is it possible that it might be usefull 
somewhere else?

 On reflection I think the best user API is probably a geqo_seed GUC in
 the range 0 to 1, and have GEQO always reset its seed to that value at
 start of a planning cycle.  This ensures plan stability, and if you need
 to experiment with alternative plans you can change to different seed
 values.  The no reset behavior doesn't seem to have much real-world
 usefulness, because even if you chance to get a good plan, you have no
 way to reproduce it...
That was my thinking as well. 

Should geqo_seed be documented from start or not?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] *_collapse_limit, geqo_threshold

2009-07-11 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On Saturday 11 July 2009 18:23:59 Tom Lane wrote:
 So far as I can find in a quick google search, neither of these families
 of functions exist on Windows :-(.  So I think maybe the best approach
 is the second one --- we could implement a port/ module that provides a
 version of whichever drand48 function we need.

 It would be possible to implement random_r the same way if we are going to 
 write something for windows anyway - Is it possible that it might be usefull 
 somewhere else?

I think a decent version of random_r would be more work than it's worth.

(In practice we'd probably just lift the module from one of the BSDen
anyway, so maybe work is the wrong measure here, but code size is
still relevant.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] *_collapse_limit, geqo_threshold

2009-07-11 Thread Kenneth Marshall
On Sat, Jul 11, 2009 at 12:23:59PM -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  The only question I have is, whether random_r or similar is available on 
  enough platforms... Has anybody an idea about this?
  On most unixoid system one could just wrap erand48() if random_r is not 
  available.
  Windows?
 
 random_r() isn't in the Single Unix Spec AFAICS, and I also don't find
 it on HPUX 10.20, so I'd vote against depending on it.  What I do see
 in SUS is initstate() and setstate() which could be used to control
 the random() function:
 http://www.opengroup.org/onlinepubs/007908799/xsh/initstate.html
 It would also work to leave random() for use by the core code and have
 GEQO depend on something from the drand48() family:
 http://www.opengroup.org/onlinepubs/007908799/xsh/drand48.html
 Probably drand48() is less random than random(), but for the limited
 purposes of GEQO I doubt we care very much.
 
Ugh, tracking down problems caused a poor random number generator
is a difficult. Poor randomness often causes weird results from
algorithms that were designed around the assumption of a random
number.

 So far as I can find in a quick google search, neither of these families
 of functions exist on Windows :-(.  So I think maybe the best approach
 is the second one --- we could implement a port/ module that provides a
 version of whichever drand48 function we need.
 
I think that having a port/module for a random number generator is a
good idea. There are a number of good, fast random number generators
to choose from.

Cheers,
Ken

 On reflection I think the best user API is probably a geqo_seed GUC in
 the range 0 to 1, and have GEQO always reset its seed to that value at
 start of a planning cycle.  This ensures plan stability, and if you need
 to experiment with alternative plans you can change to different seed
 values.  The no reset behavior doesn't seem to have much real-world
 usefulness, because even if you chance to get a good plan, you have no
 way to reproduce it...
 
   regards, tom lane
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] *_collapse_limit, geqo_threshold

2009-07-11 Thread Andres Freund
On Saturday 11 July 2009 18:23:59 Tom Lane wrote:
 On reflection I think the best user API is probably a geqo_seed GUC in
 the range 0 to 1, and have GEQO always reset its seed to that value at
 start of a planning cycle.  This ensures plan stability, and if you need
 to experiment with alternative plans you can change to different seed
 values.  The no reset behavior doesn't seem to have much real-world
 usefulness, because even if you chance to get a good plan, you have no
 way to reproduce it...
I just realized doing it in a naive way (in geqo()) causes the state to be 
reset multiple times during one query- at every invocation of 
make_rel_from_joinlist.
Does anybody see a problem with that?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] *_collapse_limit, geqo_threshold

2009-07-11 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 I just realized doing it in a naive way (in geqo()) causes the state to be 
 reset multiple times during one query- at every invocation of 
 make_rel_from_joinlist.

 Does anybody see a problem with that?

I think that's probably what we want.  Otherwise, you'd have a situation
where two identical subproblems might get planned differently.

This ties into something I was thinking about earlier: the planner is
potentially recursive (eg, it might call a user-defined function that
contains a plannable query), and it'd probably be a good idea if the
behavior of GEQO wasn't sensitive to that.  So the RNG's state needs to
be kept in PlannerInfo or some associated structure, not be global.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd historical fact about Bison

2009-07-11 Thread Shane Ambler
The real question is slow-to-upgrade OSes like HP-UX, AIX, OpenBSD and 
Solaris.  What version of Bison are they shipping with?


Mac OSX 10.4.11 - GNU Bison version 1.28



--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] xml in ruleutils

2009-07-11 Thread andrzej barszcz
Hi,

Well, best to write this way:

Goal : query splitted to base elements  
Result : xml response from server
Client : select pg_reparse_query(query);
Target: 8.4 version

How : modification of ruleutiles.c
Not done : UNION etc. 
Not done : as builtin function, I don't know how to do it )))   
Tested : partly


examples :

query : ( not meaningful by the way )

pzty - View on table X
wartość zakupu - from View of table Y
nazwa dostawcy - from Table Z

select public.pg_reparse_query($$with xxx as (select * from pzty) select
ctid,upper(nazwa dostawcy) from xxx where typ_dok='PZ' or typ_dok='WZ'
and nazwa dostawcy = 'A' and wartość zakupu = 100 order by nazwa
dostawcy desc $$);

second example:

select public.pg_reparse_query($$select ctid,upper(nazwa dostawcy)
from pzty where typ_dok='PZ' or typ_dok='WZ' and nazwa dostawcy = 'A'
and wartość zakupu = 100 order by nazwa dostawcy desc $$);


Does it make sense 


If you are interested in this solution send me email please. 


best regards
Andrzej Barszcz



 


query.xml
Description: XML document


query_without_with.xml
Description: XML document

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd historical fact about Bison

2009-07-11 Thread Josh Berkus

On 7/11/09 12:24 PM, Shane Ambler wrote:

The real question is slow-to-upgrade OSes like HP-UX, AIX, OpenBSD
and Solaris. What version of Bison are they shipping with?


Mac OSX 10.4.11 - GNU Bison version 1.28


Hmmm, given the number of other OSS things which won't install on 10.4 
(like Firefox 3), I'm not that worried about this one.  People can still 
install and run the binaries, yes?  And Bison is easily upgraded via 
Fink or MacPorts.


Plus we've required 1.875 for years anyway.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd historical fact about Bison

2009-07-11 Thread Andrew Dunstan



Shane Ambler wrote:
The real question is slow-to-upgrade OSes like HP-UX, AIX, OpenBSD 
and Solaris.  What version of Bison are they shipping with?


Mac OSX 10.4.11 - GNU Bison version 1.28




We have not supported any version less than 1.875 for as long as I have 
been working on Postgres.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd historical fact about Bison

2009-07-11 Thread Michael Meskes
On Sat, Jul 11, 2009 at 03:43:02PM -0400, Andrew Dunstan wrote:
 Mac OSX 10.4.11 - GNU Bison version 1.28
 
 We have not supported any version less than 1.875 for as long as I
 have been working on Postgres.

We switched to 1.50 at some point in 2002 because we had to, ecpg had reached
the limit of older bison versions. So OSX 10.4.11 doesn't work anyway.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
ICQ: 179140304, AIM/Yahoo/Skype: michaelmeskes, Jabber: mes...@jabber.org
Go VfL Borussia! Go SF 49ers! Use Debian GNU/Linux! Use PostgreSQL!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-11 Thread Jeff Davis
Right now this patch does not support GIN because GIN doesn't support
amgettuple.

It could be made to support GIN by doing a bitmap index scan, manually
fetching the next tuple (or, if it's lossy, the next one on the page),
checking it against the snapshot, and then rechecking it to make sure it
still matches.

The API I'm looking for is essentially the same as index_getnext(),
which makes the most sense for finding constraint violations.

Is it possible to re-add amgettuple to GIN, and just set the cost high
so it's not chosen by the planner? Or is there some reason this is
fundamentally a bad idea (or won't work at all)?

I know we removed it in 8.4:
http://archives.postgresql.org/pgsql-hackers/2009-02/msg01123.php
http://archives.postgresql.org/pgsql-hackers/2009-02/msg00532.php

but the reasoning seemed mostly because:
1. GIN could not support amgettuple efficiently (lots of rechecking)
2. Normal index scans did not fit a common use case for GIN, anyway

However, if my feature needs to perform this check anyway (to support
GIN, that is), it seems like it could be re-added. There was also some
resistance to removing it in the first place (e.g. for anti-joins), so
perhaps it can be made to be efficient again during the 8.5 cycle.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-11 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 Is it possible to re-add amgettuple to GIN, and just set the cost high
 so it's not chosen by the planner? Or is there some reason this is
 fundamentally a bad idea (or won't work at all)?

We wouldn't have deleted it if it were practical to make it work.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: generalized index constraints

2009-07-11 Thread Jeff Davis
On Sat, 2009-07-11 at 19:06 -0400, Tom Lane wrote:
  Is it possible to re-add amgettuple to GIN, and just set the cost high

 We wouldn't have deleted it if it were practical to make it work.

Can you elaborate a little?

Following the thread, I see:

http://archives.postgresql.org/pgsql-hackers/2009-02/msg00532.php

What would be wrong with letting it degrade to lossy?  I suppose the
reason it's trying to avoid that is to avoid having to recheck all the
rows on that page when it comes time to do the index insertion; but
surely having to do that is better than having arbitrary, unpredictable
failure conditions.

And the next message refers to:

http://archives.postgresql.org/message-id/4974b002.3040...@sigaev.ru

amgettuple interface hasn't possibility to work with page-wide result
instead of exact ItemPointer. amgettuple can not return just a block
number as amgetbitmap can.

I see why it's impractical to make it efficient, but the way I see it we
can make gingettuple just a wrapper around gingetbitmap, which just
iterates through the bitmap. It would not have any performance benefit
over gingetbitmap, obviously. But if my index constraints patch is going
to support GIN (which seems like an interesting application), I would
need to implement a function which does this anyway (efficiently or
not).

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] concurrent index builds unneeded lock?

2009-07-11 Thread Josh Berkus

On 7/11/09 3:50 AM, Greg Stark wrote:

Hm. Actually maybe not. What if the index is an expression index and
the expression includes a function which does an SQL operation? I'm
not sure how realistic that is since to be a danger that SQL operation
would have to be an insert, update, or delete which is not just
bending the rules.


It's not realistic at all.  People are only supposed to use IMMUTABLE 
functions for experession indexes; if they declare a volatile function 
as immutable, then it's their own lookout if they corrupt their data.


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maintenance Policy?

2009-07-11 Thread Josh Berkus

Hmmm, how about this?

The current policy of the PostgreSQL community is to stop providing 
minor versions (patches or updates) of PostgreSQL five years after a 
version of PostgreSQL is released.  In general, users can expect to 
continue to be able to get community updates for five years.


However, there have been exceptions in both directions.  Some companies 
choose to continue back-patching PostgreSQL and make those updates 
available to the community past the fifth anniversary.  Other times 
issues with build tools have caused us to discontinue a particular 
version early, such as 8.0 and 8.1 for Windows, which stopped getting 
updates in binary form after only 2 years.


Overall, if you have specific lifetime requirements for your database 
products, we strongly urge you to get a long-term support contract with 
a PostgreSQL support vendor link.


As examples of this policy, below are the dates at which updates of 
specific version became unavailable:


table here


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maintenance Policy?

2009-07-11 Thread Andrew Dunstan



Josh Berkus wrote:

Hmmm, how about this?

The current policy of the PostgreSQL community is to stop providing 
minor versions (patches or updates) of PostgreSQL five years after a 
version of PostgreSQL is released. 


I think this is putting it the wrong way round. We should say that the 
general intention is to maintain a version for at least five years, or 
some similar formulation.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First CommitFest: July 15th

2009-07-11 Thread Robert Haas
On Wed, Jul 8, 2009 at 1:11 AM, Joshua Tolleyeggyk...@gmail.com wrote:
 On Thu, Jul 02, 2009 at 03:42:56PM +0100, Dave Page wrote:
 On Thu, Jul 2, 2009 at 3:22 PM, Joshua Tolleyeggyk...@gmail.com wrote:
  On Thu, Jul 02, 2009 at 08:41:27AM +0100, Dave Page wrote:
  As far as I'm aware, there's been no code
  review yet either, which would probably be a good idea.
 
  I don't have loads of time in the coming days, but IIRC I've taken a 
  glance at
  a past version of the code, and would be willing to do so again, if it 
  would
  be useful.

 If you can look over it, that would be great. i'm not really qualified
 to review perl code, and we always prefer to have at least two sets of
 eyeballs on anything that we put into production for obvious reasons.

 On the assumption that other folks' testing has included bug hunting and the
 like, I've spent the review time I was able to muster up figuring out the code
 and looking for stuff that scared me. I didn't find anything that jumped out.
 I did wonder if the %ACTION hash in Handler.pm ought not perhaps include a
 flag to indicate that that action needs authentication, and have the handler
 take care of that instead of the individual actions.

Possibly so.  We may also find that it needs to be a bit more
fine-grained than that, as there are already three levels of access
(public, login required, administrator login required) and there could
theoretically be more in the future.

 Perhaps a similar
 technique could be profitably employed for the titles. Oh, and in Patch.pm,
 s/with/which in patches with have been Committed.

Fixed, thanks.

 Finally, I ran Perl::Critic, and attached an (admittedly untested) patch to
 clean up the things it whined about.

As usual, I'm unimpressed by the whining emitted by Perl::Critic.  I
can understand that if a function is really intended to return void
(but perl doesn't have that concept) then you probably ought to write
just return rather than return undef.  But if the function
sometimes returns a value and sometimes returns undef, insisting
that the word undef not be spelled out explicitly seems pretty
silly.

The other changes have marginally more merit, though some of them
break with surrounding whitespace conventions.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Upgrading our minimum required flex version for 8.5

2009-07-11 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
 I think it would need to be benchmarked. My faint recollection is that 
 the re-entrant lexers are slower.
 
 The flex documentation states in so many words:
The option `--reentrant' does not affect the performance of the scanner.
 Do you feel a need to verify their claim?

 No, I'll take their word for it. I must have been thinking of something 
 else.

As I got further into this, it turned out that Andrew's instinct was
right: it does need to be benchmarked.  Although the inner loops of the
lexer seem to be the same with or without --reentrant, once you buy into
the whole nine yards of --reentrant, --bison-bridge, and a pure bison
parser, you find out that the lexer's API changes: there are more
parameters to yylex() than there used to be.  It's also necessary to
pass around a yyscanner pointer to all the subroutines in scan.l.  (But
on the other hand, this eliminates accesses to global variables, which
are often not that cheap.)  So the no performance impact claim isn't
telling the whole truth.

As best I can tell after some casual testing on a couple of machines,
the actual bottom line is that raw_parser (ie, the bison and flex
processing) is going to be a couple of percent slower with a reentrant
grammar and lexer, for typical queries involving a lot of short tokens.
Now this disappears into the noise as soon as you include parse analysis
(let alone planning and execution), but it is possible to measure the
slowdown in a test harness that calls raw_parser only.

A possible compromise that I think would avoid most or all of the
slowdown is to make the lexer reentrant but not the grammar (so that
yylval and yylloc remain as global variables instead of being parameters
to yylex).  I haven't actually benchmarked that, though.  It strikes
me as a fairly silly thing to do.  If we're going to go for reentrancy
I think we should fix both components.

I'm willing to live with the small slowdown.  Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers