Re: [HACKERS] pg_trgm

2010-05-28 Thread Peter Eisentraut
On fre, 2010-05-28 at 10:04 +0900, Tatsuo Ishii wrote:
  I think the problem at hand has nothing at all to do with agglutination
  or CJK-specific issues.  You will get the same problem with other
  languages *if* you set a locale that does not adequately support the
  characters in use.  E.g., Russian with locale C and encoding UTF8:
  
  select similarity(E'\u0441\u043B\u043E\u043D', E'\u0441\u043B\u043E
  \u043D\u044B');
   similarity
  
  NaN
  (1 row)
 
 Wait. This works fine for me with stock pg_trgm. local is C and
 encoding is UTF8. What version of PostgreSQL are you using? Mine is
 8.4.4.

This is in 9.0, because 8.4 doesn't recognize the \u escape syntax.  If
you run this in 8.4, you're just comparing a sequence of ASCII letters
and digits.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch submission deadline for CommitFest 2010-07

2010-05-28 Thread Fujii Masao
Hi,

When is the patch submission deadline for CommitFest 2010-07?
July 14? or June 14 (before review fest)? Sorry, I'm not sure
what is actually different between CF and RF.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Heikki Linnakangas

On 27/05/10 22:55, Tom Lane wrote:

Peter Eisentrautpete...@gmx.net  writes:

How about
select myfunc(a := 7, b := 6);
?


Hey, that's a thought.  We couldn't have used that notation before
because we didn't have := as a separate token, but since I hacked that
in for plpgsql's benefit, I think it might be an easy fix.  It'd be
nice that it puts the argument name first like the spec syntax, too.


If we go with that, should we make some preparations to allow = in the 
future? Like provide an alternative operator name for hstore's =, and 
add a note somewhere in the docs to discourage other modules from using =.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch submission deadline for CommitFest 2010-07

2010-05-28 Thread Heikki Linnakangas

On 28/05/10 09:26, Fujii Masao wrote:

When is the patch submission deadline for CommitFest 2010-07?
July 14? or June 14 (before review fest)? Sorry, I'm not sure
what is actually different between CF and RF.


July 14. But if you finish the patch before June 14, it will get 
reviewed earlier, between June 14 and July 14. So aim for June 14 :-).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-28 Thread Heikki Linnakangas

On 27/05/10 22:56, Robert Haas wrote:

On Thu, May 27, 2010 at 3:52 PM, Kevin Grittner
kevin.gritt...@wicourts.gov  wrote:

Robert Haasrobertmh...@gmail.com  wrote:

On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner



(a)  The tuples were written within the same transaction which
created or truncated the table.



In case (a), you mess up visibility with respect to other
command-IDs within the transaction.


Surely that problem is surmountable?


I proposed an idea at PGCon, but I believe Tom and Heikki thought it
was far too grotty to consider.


No, I think it's surmountable too. We discussed hacks to teach the MVCC 
checks that all frozen tuples on a table that was created in the same 
transaction (i.e. the same cases where we skip WAL logging) were 
actually created by the running transaction, and check commandid 
accordingly.


Or detect simple DML commands where we know that the command doesn't 
read the table. COPY would usually fall into that category, though 
non-immutable input functions make that a bit iffy.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-28 Thread Jan Urbański
On 28/05/10 04:47, Tom Lane wrote:
 =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
 On 19/05/10 21:01, Jesper Krogh wrote:
 In practice, just cranking the statistics estimate up high enough seems
 to solve the problem, but doesn't
 there seem to be something wrong in how the statistics are collected?
 
 The algorithm to determine most common vals does not do it accurately.
 That would require keeping all lexemes from the analysed tsvectors in
 memory, which would be impractical. If you want to learn more about the
 algorithm being used, try reading
 http://www.vldb.org/conf/2002/S10P03.pdf and corresponding comments in
 ts_typanalyze.c
 
 I re-scanned that paper and realized that there is indeed something
 wrong with the way we are doing it.

 So I think we have to fix this. 

Hm, I'll try to take another look this evening (CEST).

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch submission deadline for CommitFest 2010-07

2010-05-28 Thread Fujii Masao
On Fri, May 28, 2010 at 4:08 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 28/05/10 09:26, Fujii Masao wrote:

 When is the patch submission deadline for CommitFest 2010-07?
 July 14? or June 14 (before review fest)? Sorry, I'm not sure
 what is actually different between CF and RF.

 July 14. But if you finish the patch before June 14, it will get reviewed
 earlier, between June 14 and July 14. So aim for June 14 :-).

Thanks! I'll do my best :)

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Failback with log shipping

2010-05-28 Thread Heikki Linnakangas
At PGCon, several people asked me about restarting an old master as a 
standby after failover has happened. And it wasn't the first time people 
ask me about it, even before 9.0. We have no mention of that in the 
docs, which is a pretty serious oversight. What can we say about it?


I believe the current official policy is that you have to take a new 
base backup and restore from that. Rsync can be used to speed that up.


However, someone once asked me for a comment on a script he wrote to do 
that in a smarter way. I forget who and when and how exactly it worked, 
but it seems possible to do safely.


First of all, you have to shut down the master cleanly for this to work, 
otherwise there can be changes in the old master that never made it to 
the standby.


Assuming controlled shutdown and that the standby received all WAL from 
the old master before it was promoted, I think you can simply create a 
recovery.conf in the old master's data directory to turn it into a 
standby server, and restart. Am I missing something?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Working with PostgreSQL enums in C code

2010-05-28 Thread Robert Haas
On Fri, May 28, 2010 at 12:07 AM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 I learned that to return an enum value from C, one needs to return the
 OID of the right row of the pg_enum table.  I eventually managed to
 write the code below, which is mostly based on the enum_in function in
 src/backend/utils/adt/enum.c .

PG_RETURN macros shouldn't do any nontrivial processing (see the
existing ones for references).  I assume you have the enum labels
declared in pg_enum.h, so I think you can just return the correct OID
values directly.  Declare constants for them in pg_enum.h and then
just do PG_RETURN_OID(whatever).

#define JSONTypeNullOid   ...
#define JSONTypeStringOid ...
#define JSONTypeNumberOid ...

It really shouldn't be necessary to do a catalog lookup to retrieve a constant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mingw initdb failure on HEAD

2010-05-28 Thread Andrew Dunstan



Takahiro Itagaki wrote:

Andrew Dunstan and...@dunslane.net wrote:

  
Several buildfarm mingw members are getting failures like this, when 
running initdb:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dawn_batdt=2010-05-27%2019:45:18

Could it have been caused by the PGDLLIMPORT/PGDLLEXPORT changes?



Probably, but it's curious because MSVC members are OK.
Do we have special treatments for exported functions in mingw?
It might export 'dllimport' funtions/variables, but not 'dllexport' ones.


  


It has broken Cygwin as well, so that's two out of three Windows 
platforms that don't like this.


I am not sure what the best fix for the original problem is, but this 
isn't it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Mark PG_MODULE_MAGIC and PG_FUNCTION_INFO_V1 with PGDLLEXPORT

2010-05-28 Thread Heikki Linnakangas

On 27/05/10 10:59, Takahiro Itagaki wrote:

Log Message:
---
Mark PG_MODULE_MAGIC and PG_FUNCTION_INFO_V1 with PGDLLEXPORT
independently from BUILDING_DLL. It is always __declspec(dllexport).


It looks like the Windows buildfarm members are not happy about this 
change...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specification for Trusted PLs?

2010-05-28 Thread Sam Mason
On Thu, May 27, 2010 at 11:09:26PM -0400, Tom Lane wrote:
 David Fetter da...@fetter.org writes:
  I don't know about a *good* idea, but here's the one I've got.
 
  1.  Make a whitelist.  This is what needs to work in order for a
  language to be a fully functional trusted PL.
 
 Well, I pretty much lose interest right here, because this is already
 assuming that every potentially trusted PL is isomorphic in its
 capabilities.

That's not normally a problem.  The conventional way would be to place
the interpreter in its own sandbox, similar to how Chrome has each tab
running in its own process.  These processes are protected in a way
so that the code running inside them can't do any harm--e.g. a ptrace
jail[1].  This is quite a change from existing pl implementations, and
present a different set of performance/compatibility issues.

 If that were so, there'd not be very much point in
 supporting multiple PLs.  A good example here is R.  I have no idea
 whether PL/R is trusted or trustworthy, but in any case the main point
 of supporting that PL is to allow access to the R statistical library.
 How does that fit into a whitelist designed for some other language?
 It doesn't.

AFAIU, a trusted language should only be able to perform computation,
e.g. not touch the local filesystem, beyond readonly access to library
code, and not see the network.  Policies such as these are easy to
enforce in a ptrace jail, and would still allow a trusted pl/r to do
whatever it wants to get any pure calculation done.  As soon as it needs
to touch the file system the language becomes non-trusted.

  3.  (the un-fun part) Write tests which attempt to do things not in
  the whitelist.  We can start from the vulnerabilities so far
  discovered.
 
 And here is the *other* fatal problem: a whitelist does not in fact give
 any leverage at all for testing whether there is access to functionality
 outside the whitelist.  (It might be useful if you could enforce the
 whitelist at some sufficiently low level of the language implementation,
 but as a matter of testing, it does nothing for you.)  What you're
 suggesting isn't so much un-fun as un-possible.  Given a maze of twisty
 little subroutines all different, how will you find out if any of them
 contain calls of unwanted functionality?

A jail helps with a lot of this; the remainder is in the normal fact
that bug testing can only demonstrate the presence of bugs and you need
to do formal code proof to check for the absence of bugs.

-- 
  Sam  http://samason.me.uk/
 
 [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.122.5494

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specification for Trusted PLs?

2010-05-28 Thread Peter Eisentraut
On fre, 2010-05-28 at 13:03 +0100, Sam Mason wrote:
 That's not normally a problem.  The conventional way would be to place
 the interpreter in its own sandbox, similar to how Chrome has each tab
 running in its own process.  These processes are protected in a way
 so that the code running inside them can't do any harm--e.g. a ptrace
 jail[1].  This is quite a change from existing pl implementations, and
 present a different set of performance/compatibility issues.

Surely a definition of a trusted language that invalidates the existing
trusted languages is not going help resolve the issue.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specification for Trusted PLs?

2010-05-28 Thread Andrew Dunstan



Sam Mason wrote:

On Thu, May 27, 2010 at 11:09:26PM -0400, Tom Lane wrote:
  

David Fetter da...@fetter.org writes:


I don't know about a *good* idea, but here's the one I've got.
  
1.  Make a whitelist.  This is what needs to work in order for a

language to be a fully functional trusted PL.
  

Well, I pretty much lose interest right here, because this is already
assuming that every potentially trusted PL is isomorphic in its
capabilities.



That's not normally a problem.  The conventional way would be to place
the interpreter in its own sandbox, similar to how Chrome has each tab
running in its own process.  These processes are protected in a way
so that the code running inside them can't do any harm--e.g. a ptrace
jail[1].  This is quite a change from existing pl implementations, and
present a different set of performance/compatibility issues.

  


I have my own translation of this last sentence.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failback with log shipping

2010-05-28 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Assuming controlled shutdown and that the standby received all WAL from the
 old master before it was promoted, I think you can simply create a
 recovery.conf in the old master's data directory to turn it into a standby
 server, and restart. Am I missing something?

Would that mean that a controlled restart of the old master so that the
recovery stops before applying the logs that were not shipped to the
slave would put it in the same situation?

How easy is it to script that? It seems all we need is the current XID
of the slave at the end of recovery. It should be in the log, maybe it's
easy enough to expose it at the SQL level…

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failback with log shipping

2010-05-28 Thread Heikki Linnakangas

On 28/05/10 16:11, Dimitri Fontaine wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

Assuming controlled shutdown and that the standby received all WAL from the
old master before it was promoted, I think you can simply create a
recovery.conf in the old master's data directory to turn it into a standby
server, and restart. Am I missing something?


Would that mean that a controlled restart of the old master so that the
recovery stops before applying the logs that were not shipped to the
slave would put it in the same situation?


Not shipped before the first failover you mean? No, if any WAL records 
were created in the old master that were not shipped to the standby 
before failover, the corresponding changes to the data files might've 
been flushed to disk already, and you can't undo those by not replaying 
the WAL record on restart.



How easy is it to script that? It seems all we need is the current XID
of the slave at the end of recovery. It should be in the log, maybe it's
easy enough to expose it at the SQL level…


XID doesn't help at all, LSN more likely, but I feel that I don't fully 
understand what you're saying.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Peter Eisentrautpete...@gmx.net  writes:
 How about
 select myfunc(a := 7, b := 6);

 If we go with that, should we make some preparations to allow = in the 
 future? Like provide an alternative operator name for hstore's =, and 
 add a note somewhere in the docs to discourage other modules from using =.

I'd vote no.  We're intentionally choosing to deviate from a very poor
choice of notation.  Maybe Peter can interest the committee in allowing
:= as an alternate notation, instead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr

2010-05-28 Thread Bruce Momjian
Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  There are functions pg_stat_get_backend_client_addr and
  pg_stat_get_backend_client_port, which are exposed through the
  pg_stat_activity view, but there is no straightforward way to get the
  server-side address and port of a connection.  This is obviously much
  less commonly needed than the client information,
 
 ... indeed.  Is it worth burdening the pg_stats mechanism with this?
 The use case seems vanishingly thin.

I am confused how this is different from inet_server_addr() and
inet_server_port().  

Also, these functions return nothing for unix domain connections. 
Should they, particularly for the port number which we do use to map to
a socket name?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Working with PostgreSQL enums in C code

2010-05-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, May 28, 2010 at 12:07 AM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 I learned that to return an enum value from C, one needs to return the
 OID of the right row of the pg_enum table.  I eventually managed to
 write the code below, which is mostly based on the enum_in function in
 src/backend/utils/adt/enum.c .

 PG_RETURN macros shouldn't do any nontrivial processing (see the
 existing ones for references).

Yeah, that was my first reaction too.  If we don't already have one,
it would be appropriate to provide a lookup enum value function
(functionally about the same as enum_in, but designed to be called
conveniently from C).  Then, if you needed to work from a textual
enum label, you'd call that function and then PG_RETURN_OID.

However, for a built-in enum type, I agree with Robert's solution of
just #define-ing fixed OIDs for the values of the type.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Pavel Stehule
2010/5/28 Tom Lane t...@sss.pgh.pa.us:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Peter Eisentrautpete...@gmx.net  writes:
 How about
 select myfunc(a := 7, b := 6);

 If we go with that, should we make some preparations to allow = in the
 future? Like provide an alternative operator name for hstore's =, and
 add a note somewhere in the docs to discourage other modules from using =.

 I'd vote no.  We're intentionally choosing to deviate from a very poor
 choice of notation.  Maybe Peter can interest the committee in allowing
 := as an alternate notation, instead.

-1

I prefer a standard. And again - it isn't poor syntax - ADA, Perl use
it, It can be a funny if ANSI SQL committee change some design from
Oracle's proposal to PostgreSQL's proposal.

Regards

Pavel



                        regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr

2010-05-28 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Tom Lane wrote:
 ... indeed.  Is it worth burdening the pg_stats mechanism with this?
 The use case seems vanishingly thin.

 I am confused how this is different from inet_server_addr() and
 inet_server_port().  

I think the point is to let someone find out *from another session*
which server port number a particular session is using.  I fail to see
a significant use case for that, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr

2010-05-28 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Tom Lane wrote:
  ... indeed.  Is it worth burdening the pg_stats mechanism with this?
  The use case seems vanishingly thin.
 
  I am confused how this is different from inet_server_addr() and
  inet_server_port().  
 
 I think the point is to let someone find out *from another session*
 which server port number a particular session is using.  I fail to see
 a significant use case for that, though.

Uh, aren't they all using the same server port number, e.g. 5432?  Is
the issue different IP addresses for the same server?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Andrew Dunstan



Tom Lane wrote:

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  

Peter Eisentrautpete...@gmx.net  writes:
  

How about
select myfunc(a := 7, b := 6);



  
If we go with that, should we make some preparations to allow = in the 
future? Like provide an alternative operator name for hstore's =, and 
add a note somewhere in the docs to discourage other modules from using =.



I'd vote no.  We're intentionally choosing to deviate from a very poor
choice of notation.  Maybe Peter can interest the committee in allowing
:= as an alternate notation, instead.


  


What's poor about it? It probably comes from PLSQL which in turn got it 
from Ada, so they aren't just making this up.  I agree it's inconvenient 
for us, but that's a different issue.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr

2010-05-28 Thread Peter Eisentraut
On fre, 2010-05-28 at 10:21 -0400, Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   Tom Lane wrote:
   ... indeed.  Is it worth burdening the pg_stats mechanism with this?
   The use case seems vanishingly thin.
  
   I am confused how this is different from inet_server_addr() and
   inet_server_port().  
  
  I think the point is to let someone find out *from another session*
  which server port number a particular session is using.  I fail to see
  a significant use case for that, though.
 
 Uh, aren't they all using the same server port number, e.g. 5432?  Is
 the issue different IP addresses for the same server?

Yes, I would like to know who is connecting to what IP address.  It's
useful if you have HA setups and you need to check which way your
connections are going.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr

2010-05-28 Thread Bruce Momjian
Peter Eisentraut wrote:
 On fre, 2010-05-28 at 10:21 -0400, Bruce Momjian wrote:
  Tom Lane wrote:
   Bruce Momjian br...@momjian.us writes:
Tom Lane wrote:
... indeed.  Is it worth burdening the pg_stats mechanism with this?
The use case seems vanishingly thin.
   
I am confused how this is different from inet_server_addr() and
inet_server_port().  
   
   I think the point is to let someone find out *from another session*
   which server port number a particular session is using.  I fail to see
   a significant use case for that, though.
  
  Uh, aren't they all using the same server port number, e.g. 5432?  Is
  the issue different IP addresses for the same server?
 
 Yes, I would like to know who is connecting to what IP address.  It's
 useful if you have HA setups and you need to check which way your
 connections are going.

OK, at least now I understand the goal.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Josh Berkus



What's poor about it? It probably comes from PLSQL which in turn got it
from Ada, so they aren't just making this up. I agree it's inconvenient
for us, but that's a different issue.


Further, the
( parameter := value ) notation is not only consistent with what is used 
inside pl/pgsql, it's also more consistent than AS with SQL Server's 
named parameter notation, which is:


EXEC dbo.GetItemPrice @ItemCode = 'GXKP', @PriceLevel = 5

Since former SQL Server / Sybase apps are the most likely to use named 
parameter notation in PostgreSQL, having a syntax which could be ported 
using only sed would be nice.


Relevant to the whole discussion, though ... is the conflicting SQL 
standard syntax something we're every likely to implement?


--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How to pass around collation information

2010-05-28 Thread Peter Eisentraut
I have been thinking about this collation support business a bit.
Ignoring for the moment where we would get the actual collation routines
from, I wonder how we are going to pass this information around in the
system.  Someone declares a collation on a column in a table, and
somehow this information needs to arrive in bttextcmp() and friends.
Also, functions that take in a string and return one (e.g., substring),
need to take in this information and return it back out.  How should
this work?

Option 1, make it part of the datum.  That way it will pass through the
system just fine, but it would waste a lot of storage and break just
about everything that operates on string types now, as well as
pg_upgrade.  So that's probably out.

Option 2, invent some new mechanism that accompanies a datum or a type
whereever it goes.  Kind of like typmod, but not really.  Then the
collation information would presumably be made available to functions
through the fmgr interface.  The binary representation of data values
stays the same.

Option 2a, while we are at it, are there any other things of this nature
that would be worth supporting at the same time?  I could imagine that
having the option to pass around the ctype locale or the text search
dictionary in a similar way could be useful.  Is this something that
could be combined with typmod or other dormant data type metadata
requirements (PostGIS?, XML?)?

Ideas?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Bruce Momjian
Josh Berkus wrote:
 Since former SQL Server / Sybase apps are the most likely to use named 
 parameter notation in PostgreSQL, having a syntax which could be ported 
 using only sed would be nice.
 
 Relevant to the whole discussion, though ... is the conflicting SQL 
 standard syntax something we're every likely to implement?

Not sure, but I assume people could be using the AS syntax in other
databases (for the inheritance usage) and then trying to use it in our
database.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Andrew Dunstan



Bruce Momjian wrote:

Josh Berkus wrote:
  
Since former SQL Server / Sybase apps are the most likely to use named 
parameter notation in PostgreSQL, having a syntax which could be ported 
using only sed would be nice.


Relevant to the whole discussion, though ... is the conflicting SQL 
standard syntax something we're every likely to implement?



Not sure, but I assume people could be using the AS syntax in other
databases (for the inheritance usage) and then trying to use it in our
database.

  


Yeah. Whether or not we ever implement it really doesn't matter, IMO. We 
should not be in the business of taking an SQL standard piece of syntax 
and using it for some other purpose.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to pass around collation information

2010-05-28 Thread alvherre
Excerpts from Peter Eisentraut's message of vie may 28 12:27:52 -0400 2010:

 Option 2, invent some new mechanism that accompanies a datum or a type
 whereever it goes.  Kind of like typmod, but not really.  Then the
 collation information would presumably be made available to functions
 through the fmgr interface.  The binary representation of data values
 stays the same.

Is the collation a property of the datum, or one of the comparison?
If the latter, should it be really be made a sidecar of a datum, or
would it make more sense to attach it to the operation being performed?

I wonder if instead of trying to pass it down multiple layers till
bttextcmp and further down, it would make more sense to set a global
variable somewhere in the high levels, and only have it checked in
varstr_cmp.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Heikki Linnakangas

On 28/05/10 19:27, Peter Eisentraut wrote:

I have been thinking about this collation support business a bit.
Ignoring for the moment where we would get the actual collation routines
from, I wonder how we are going to pass this information around in the
system.  Someone declares a collation on a column in a table, and
somehow this information needs to arrive in bttextcmp() and friends.


Yes. Comparison operators need it, as do functions like isalpha().


Also, functions that take in a string and return one (e.g., substring),
need to take in this information and return it back out.  How should
this work?


Hmm, I don't see what substring would need collation for. And it 
certainly shouldn't be returning it. Collation is a property of the 
comparison operators (and isalpha etc.), and the planner needs to deduce 
the right collation for each such operation in the query. That involves 
looking at the tables and columns involved, as well as per-user 
information and any explicit COLLATE clauses in the query, but all that 
happens at plan-time.



Option 1, make it part of the datum.  That way it will pass through the
system just fine, but it would waste a lot of storage and break just
about everything that operates on string types now, as well as
pg_upgrade.  So that's probably out.


It's also fundamentally wrong, collation is not a property of a datum 
but of the operation.



Option 2, invent some new mechanism that accompanies a datum or a type
whereever it goes.  Kind of like typmod, but not really.  Then the
collation information would presumably be made available to functions
through the fmgr interface.  The binary representation of data values
stays the same.


Something like that. I'm thinking that bttextcmp() and friends will 
simply take an extra argument indicating the collation, and we'll teach 
the operator / operator class infrastructure about that too.


One way to approach this is to realize that it's already possible to use 
multiple collations in a database. You just have to define separate  = 
 operators and operator classes for every collation, and change all 
your queries to use the right operator depending on the desired 
collation everywhere where you use  =  (including ORDER BYs, with the 
USING operator syntax). The behavior is exactly what we want, it's 
just completely inpractical, so we need something to do the same in a 
less cumbersome way.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Heikki Linnakangas

On 28/05/10 19:19, Josh Berkus wrote:

( parameter := value ) notation is not only consistent with what is used
inside pl/pgsql, it's also more consistent than AS with SQL Server's
named parameter notation, which is:

EXEC dbo.GetItemPrice @ItemCode = 'GXKP', @PriceLevel = 5

Since former SQL Server / Sybase apps are the most likely to use named
parameter notation in PostgreSQL, having a syntax which could be ported
using only sed would be nice.


Once you solve the problem of finding the '='s in the source, replacing 
them is exactly the same effort regardless of what you replace them with.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] dividing money by money

2010-05-28 Thread Andy Balholm
I'm not quite sure how to go about changing it from an add-on function to a 
built-in one. So if you want to do that, go ahead. If you'd rather I did, just 
tell me how it's done.

Andy Balholm
(509) 276-2065
a...@balholm.com

On May 26, 2010, at 11:18 AM, Kevin Grittner wrote:

 Hi Andy,
 
 Do you want to package this up as a patch for 9.1?  If not, is it OK
 if I do?
 
 -Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Peter Eisentraut
On fre, 2010-05-28 at 20:22 +0300, Heikki Linnakangas wrote:
 It's also fundamentally wrong, collation is not a property of a datum 
 but of the operation.

 One way to approach this is to realize that it's already possible to
 use 
 multiple collations in a database. You just have to define separate 
 = 
   operators and operator classes for every collation, and change all 
 your queries to use the right operator depending on the desired 
 collation everywhere where you use  =  (including ORDER BYs, with
 the 
 USING operator syntax). The behavior is exactly what we want, it's 
 just completely inpractical, so we need something to do the same in a 
 less cumbersome way.

Well, maybe we should step back a little and work out what sort of
feature we actually want, if any.  The feature I'm thinking of is what
people might call per-column locale, and the SQL standard defines
that.  It would look like this:

CREATE TABLE test (
a varchar COLLATE de,
b varchar COLLATE fr
);

SELECT * FROM test WHERE a  'baz' ORDER BY b;

So while it's true that the collation is used by the operations ( and
ORDER BY), the information which collation to use comes with the data
values.  It's basically saying, a is in language de, so sort it like
that unless told otherwise.  There is also an override syntax available,
like this:

SELECT * FROM test WHERE a COLLATE en  'baz' ORDER BY b COLLATE sv;

But here again the collation is attached to a data value, and only from
there it is passed to the operator.  What is actually happening is

SELECT * FROM test WHERE (a COLLATE en)  'baz' ORDER BY (b COLLATE sv);


What you appear to be describing is a per-operation locale, which also
sounds valid, but it would be a different thing.  It might be thought of
as this:

SELECT * FROM test WHERE a ( COLLATE en) 'baz' ORDER BY COLLATE sv b;

with some suitable global default.


So which one of these should it be?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VPATH docs

2010-05-28 Thread Bruce Momjian

Patch applied.  Thanks.

---


David Fetter wrote:
 Folks,
 
 Andrew Dunstan posted some instructions on his blog, and I'm thinking
 they clarify things a great deal for people who want to learn how to
 do VPATH builds.
 
 Attached patch adds the description along with an index term.  What
 say?
 
 Cheers,
 David.
 -- 
 David Fetter da...@fetter.org http://fetter.org/
 Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
 Skype: davidfetter  XMPP: david.fet...@gmail.com
 iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
 
 Remember to vote!
 Consider donating to Postgres: http://www.postgresql.org/about/donate

[ Attachment, skipping... ]

 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Since former SQL Server / Sybase apps are the most likely to use named 
 parameter notation in PostgreSQL, having a syntax which could be ported 
 using only sed would be nice.

I fear you're vastly overestimating the ability of sed to distinguish
between = used in this way and = used in any other way.  Still, putting
the parameter name on the left is clearly both more natural and more
like every other product.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Yeah. Whether or not we ever implement it really doesn't matter, IMO. We 
 should not be in the business of taking an SQL standard piece of syntax 
 and using it for some other purpose.

Evidently the 201x SQL standard has blindsided us twice: first by
defining a syntax for named parameters that wasn't like ours, and second
by defining a syntax for something else that conflicted with ours.
I agree that the AS approach is pretty untenable given that double
whammy, and we'd better get rid of it.  (Hopefully Peter will be able
to keep us better apprised of things in the future.)

It seems that we're agreed on trying to use := instead, and the only
debate is about whether to deprecate use of = as an operator.  But
anything that we might do about the latter would reach no farther than
the documentation in 9.0 anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 So while it's true that the collation is used by the operations ( and
 ORDER BY), the information which collation to use comes with the data
 values.  It's basically saying, a is in language de, so sort it like
 that unless told otherwise.  There is also an override syntax available,
 like this:

 SELECT * FROM test WHERE a COLLATE en  'baz' ORDER BY b COLLATE sv;

That seems fairly bizarre.  What does this mean:

WHERE a COLLATE en  b COLLATE de

?  If it's an error, why is this not an error

WHERE a COLLATE en  b

if b is marked as COLLATE de in its table?

I guess the more general question is whether the spec expects that
collation settings can be derived statically (like type information)
or whether they might sometimes only be known at runtime.

We also need to think about whether we're okay with only applying
collation to built-in types (text, varchar, char) or whether we need
the feature to work for add-on types as well.  In particular, is citext
still a meaningful feature if we have this, or is it superseded by
COLLATE?  In the abstract I'd prefer to let it work for user-defined
types, but if we can have a much simpler implementation by not doing
so, it might be better to give that up.

Is COLLATE a property that can be attached to a domain over text?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Robert Haas
On Fri, May 28, 2010 at 2:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 So while it's true that the collation is used by the operations ( and
 ORDER BY), the information which collation to use comes with the data
 values.  It's basically saying, a is in language de, so sort it like
 that unless told otherwise.  There is also an override syntax available,
 like this:

 SELECT * FROM test WHERE a COLLATE en  'baz' ORDER BY b COLLATE sv;

 That seems fairly bizarre.  What does this mean:

        WHERE a COLLATE en  b COLLATE de

 ?  If it's an error, why is this not an error

        WHERE a COLLATE en  b

 if b is marked as COLLATE de in its table?

I think we need to think of the comparison operators as ternary, and
the COLLATE syntax applied to columns or present in queries as various
ways of setting defaults or explicit overrides for what the third
argument will end up being.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Robert Haas
On Fri, May 28, 2010 at 10:08 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2010/5/28 Tom Lane t...@sss.pgh.pa.us:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Peter Eisentrautpete...@gmx.net  writes:
 How about
 select myfunc(a := 7, b := 6);

 If we go with that, should we make some preparations to allow = in the
 future? Like provide an alternative operator name for hstore's =, and
 add a note somewhere in the docs to discourage other modules from using =.

 I'd vote no.  We're intentionally choosing to deviate from a very poor
 choice of notation.  Maybe Peter can interest the committee in allowing
 := as an alternate notation, instead.

 -1

 I prefer a standard. And again - it isn't poor syntax - ADA, Perl use
 it, It can be a funny if ANSI SQL committee change some design from
 Oracle's proposal to PostgreSQL's proposal.

I agree.  It's good syntax.  I think we should try hard to adopt it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failback with log shipping

2010-05-28 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Not shipped before the first failover you mean? No, if any WAL records were
 created in the old master that were not shipped to the standby before
 failover, the corresponding changes to the data files might've been flushed
 to disk already, and you can't undo those by not replaying the WAL record on
 restart.

Ah yes you need to fail between when (WAL is written and not sent) and
CHECKPOINT for this to be possible. But automatic testing of the
situation (is the data already safe in PGDATA) might still be possible?

 How easy is it to script that? It seems all we need is the current XID
 of the slave at the end of recovery. It should be in the log, maybe it's
 easy enough to expose it at the SQL level…

 XID doesn't help at all, LSN more likely, but I feel that I don't fully
 understand what you're saying.

Sorry I was unclear, I was thinking in terms of recovery.conf file and
either recovery_target_xid or recovery_target_time. The idea being that
if the old-master didn't CHECKPOINT the changes that the slave missed,
then we can do crash recovery and choose to stop before that point, then
apply WALs from the new master.

That might sounds like a strange thing to do, but if switching from
master to slave allows skipping the base backup to get a slave again, I
guess we'll see people choosing the all automated failover scripting
(with heartbeat and so on). The goal would be to reduce downtime the
more you can.

When possible I'd still choose manual failover to the slave after a
master's restart and crash recovery, but the downtime constraint might
not allow that everywhere.

So you're saying controlled failover could possibly skip base backup to
reuse old master as new slave, and I'm asking if by some luck (crash
happened before CHECKPOINT) and some recovery.conf setup we could get to
the same situation in case of hard failure. That would allow completely
automatic switchover / failover with no need to resync.

I'm not sure how much clearer I managed to be :)

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Peter Eisentraut
On fre, 2010-05-28 at 15:03 -0400, Robert Haas wrote:
 I think we need to think of the comparison operators as ternary, and
 the COLLATE syntax applied to columns or present in queries as various
 ways of setting defaults or explicit overrides for what the third
 argument will end up being.

How could this extend to things like isalpha() or upper() that would
need access to ctype information?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: PGDLLEXPORT is __declspec (dllexport) only on MSVC, but is

2010-05-28 Thread Tom Lane
itag...@postgresql.org (Takahiro Itagaki) writes:
 Log Message:
 ---
 PGDLLEXPORT is __declspec (dllexport) only on MSVC,
 but is __declspec (dllimport) on other compilers
 because cygwin and mingw don't like dllexport.

That probably explains why the code was the way it was before ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Peter Eisentraut
On fre, 2010-05-28 at 14:48 -0400, Tom Lane wrote:
  SELECT * FROM test WHERE a COLLATE en  'baz' ORDER BY b COLLATE sv;
 
 That seems fairly bizarre.  What does this mean:
 
   WHERE a COLLATE en  b COLLATE de
 
 ?  If it's an error, why is this not an error
 
   WHERE a COLLATE en  b
 
 if b is marked as COLLATE de in its table?

The way I understand it, a collation derivation can be explicit or
implicit.  Explicit derivations override implicit derivations.  If in
the argument set of an operation, explicit collation derivations exist,
they must all be the same.

 I guess the more general question is whether the spec expects that
 collation settings can be derived statically (like type information)
 or whether they might sometimes only be known at runtime.

It looks like it is treated like type information.  The derivation and
validation rules are part of the Syntax Rules.

 We also need to think about whether we're okay with only applying
 collation to built-in types (text, varchar, char) or whether we need
 the feature to work for add-on types as well.  In particular, is citext
 still a meaningful feature if we have this, or is it superseded by
 COLLATE?  In the abstract I'd prefer to let it work for user-defined
 types, but if we can have a much simpler implementation by not doing
 so, it might be better to give that up.

I think if we get this done using the strcoll_l() API to do the work,
which looks like the path of least resistance at the moment, citext
would still be useful because all the standard locales would still be
case sensitive.

 Is COLLATE a property that can be attached to a domain over text?

According to the spec, yes.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 28/05/10 19:19, Josh Berkus wrote:
 EXEC dbo.GetItemPrice @ItemCode = 'GXKP', @PriceLevel = 5

 Once you solve the problem of finding the '='s in the source, replacing them
 is exactly the same effort regardless of what you replace them with.

I guess it would be a choice of target between
  'GXKP' AS ItemCode, 5 AS PriceLevel
and
  ItemCode := 'GXKP', PriceLevel := 5

By the way, as it seems we're voting, I much prefer := than either the
AS and = variant, and I'm not keen on seeing us deprecate the operator.

Further, as said Andrew, keeping AS conflicting with the standard with
no hysterical raisin to do so would be a bad move IMHO.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Dimitri Fontaine
Hi,

Peter Eisentraut pete...@gmx.net writes:
 On fre, 2010-05-28 at 20:22 +0300, Heikki Linnakangas wrote:
 USING operator syntax). The behavior is exactly what we want, it's 
 just completely inpractical, so we need something to do the same in a 
 less cumbersome way.

For an example, here is something I did to better understand the system
a while ago. Of course I never got to use it for real:

  http://pgsql.tapoueh.org/btree_fr_ops/

 Well, maybe we should step back a little and work out what sort of
 feature we actually want, if any.  The feature I'm thinking of is what
 people might call per-column locale, and the SQL standard defines
 that.  It would look like this:

 CREATE TABLE test (
 a varchar COLLATE de,
 b varchar COLLATE fr
 );

 SELECT * FROM test WHERE a  'baz' ORDER BY b;

 So while it's true that the collation is used by the operations ( and
 ORDER BY), the information which collation to use comes with the data
 values.  It's basically saying, a is in language de, so sort it like
 that unless told otherwise.  There is also an override syntax available,
 like this:

 SELECT * FROM test WHERE a COLLATE en  'baz' ORDER BY b COLLATE sv;

 But here again the collation is attached to a data value, and only from
 there it is passed to the operator.  What is actually happening is

 SELECT * FROM test WHERE (a COLLATE en)  'baz' ORDER BY (b COLLATE sv);


 What you appear to be describing is a per-operation locale, which also
 sounds valid, but it would be a different thing.  It might be thought of
 as this:

 SELECT * FROM test WHERE a ( COLLATE en) 'baz' ORDER BY COLLATE sv b;

 with some suitable global default.


 So which one of these should it be?

My understanding is that what we do is per-operation locale. The locale
information bears no semantic when not attached to some operation on
strings, like sorting or comparing.

So what you're showing here I think is how to attach a collation label
to every string in the system, at the catalog level or dynamically at
the query level. 

Now this collation label will only be used whenever you want to use a
collation aware function or operator. Those functions need to get the
labels for their implementation to have the expected meaning.

So we need both to attach collations to all known strings (defaulting to
the current database collation I guess), as you showed at the SQL level,
and to pass this information down to the functions operating on those
strings.

A confusing example on this grounds would be the following, which I hope
the standard disallow:

  SELECT * FROM test WHERE a COLLATE en  b COLLATE sv;

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-28 Thread Jan Urbański
On 28/05/10 04:47, Tom Lane wrote:
 I re-scanned that paper and realized that there is indeed something
 wrong with the way we are doing it.  The paper says (last sentence in
 the definition of the algorithm, section 4.2):
 
   When a user requests a list of items with threshold s, we output
   those entries in D where f = (s-e)N.
 
 What we are actually doing is emitting every entry with f = 2.  Since
 e is fixed at 1/w, this effectively means that we are setting s to be
 only fractionally greater than e, which means very large relative errors
 in the estimates.

I gave it a though and reread the paper, but since I already blundered
once, please verify me on this.

We follow the algorithm as written, the trouble starts when we want to
output the result. The paper says which items from the D structure
should be returned when the user asks for items that have frequencies
higher than a threshold s. What we want to put in the statistics table
are items accompanied by their frequencies, so we need to do some
reasoning before we can construct the result.

Say we have an item I with count f (taken from our D structure). The
total number of entries is N. The question would be: what would be the
minimum frequency that the user could specify, that would still make us
output this element. From

f = (s - e) * N

we can say it's

s = (f / N) + e

So if the user wants items that occur with frequency (f / N) + e or
less. This would mean that the corresponding value in the statistics
entry should be  I, (f / N) + e) 

The thing is, this doesn't change much, because currently we are putting
(f / N) there, and e is set to 1 / stats_target * 10, so the difference
would not be dramatic.

 Or, if you want it explained another way: we are emitting words whose f
 is very small and whose delta is very large, representing items that got
 added to the scan very late.  These really shouldn't be there because
 their true frequency is probably a lot less than the intended threshold.

Well, the idea of the algorithm is that if their frequency would have
been bigger, they would appear earlier and would survive the pruning, as
I understand it.

 The net effect of this is first that there are a lot of rather useless
 entries in the MCV list whose claimed frequency is quite small, like as
 little as two occurrences.  Their true frequency could be quite a bit
 more.  What's even worse is that we believe that the minimum of these
 claimed frequencies is a reliable upper bound for the frequencies of
 items not listed in the MCV list.

Per the algorithm it *is* the upper bound, if I got my maths correctly.
The last item in the list would not be returned if the requested
frequency was higher than the one that is associated to that item.

 So I think we have to fix this.  The right thing is to select s and e
 values that are actually meaningful in the terms of the paper, then
 compute w from that, and be sure to enforce the minimum f value
 specified in the algorithm ... ie, don't be afraid to throw away values
 in the final D table.

I we should definitely prune the table one last time in the very
probable case that the loop ended in the middle of two regularly
happening prunes.

As for selecting the algorithm parameters: we don't get to select s. We
do get to select e, but that's it. I have a feeling that our problems
are caused by thte fact, that the algorithm tries to answer the question
which elements occur more frequently than X and we actually want the
answer to the what's the frequency of each element. I've almost
convinced myself that the transformation between the answers to these
questions exists, but this trouble report keeps giving me doubts.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add SIGCHLD catch to psql

2010-05-28 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 I thought it might be about that simple once you went at it the right
 way ;-).  However, I'd suggest checking ferror(pset.queryFout) as well
 as the fflush result.

 Sure, I can add the ferror() check.  Patch attached.

This seemed pretty small and uncontroversial, so I went ahead and
committed it for 9.0.  I rearranged the order of operations a bit to
make it seem more coherent, and also added an initial clearerr() just
to forestall problems if stdout had the error flag set for some reason.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Robert Haas
On Fri, May 28, 2010 at 3:20 PM, Peter Eisentraut pete...@gmx.net wrote:
 On fre, 2010-05-28 at 15:03 -0400, Robert Haas wrote:
 I think we need to think of the comparison operators as ternary, and
 the COLLATE syntax applied to columns or present in queries as various
 ways of setting defaults or explicit overrides for what the third
 argument will end up being.

 How could this extend to things like isalpha() or upper() that would
 need access to ctype information?

Good question.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-28 Thread Tom Lane
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
 We follow the algorithm as written, the trouble starts when we want to
 output the result. The paper says which items from the D structure
 should be returned when the user asks for items that have frequencies
 higher than a threshold s. What we want to put in the statistics table
 are items accompanied by their frequencies, so we need to do some
 reasoning before we can construct the result.

Well, the estimated frequency is still just f/N.  The point is that we
must filter out items with small f values because they're probably
inaccurate --- in particular, anything with f  eN is completely
untrustworthy.

I agree that we currently aren't bothering to determine a specific s
value, but we probably need to do that in order to have a clear
understanding of what we are getting.

The idea that I was toying with is to assume a Zipfian distribution of
the input (with some reasonable parameter), and use that to estimate
what the frequency of the K'th element will be, where K is the target
number of MCV entries or perhaps a bit more.  Then use that estimate as
the s value, and set e = s/10 or so, and then w = 1/e and continue as
per the paper.  If the eventual filtering results in a lot less than the
target number of MCV entries (because the input wasn't so Zipfian), we
lose, but at least we have accurate numbers for the entries we kept.

 I we should definitely prune the table one last time in the very
 probable case that the loop ended in the middle of two regularly
 happening prunes.

The paper doesn't say that you need to do that.  I suspect if you work
through the math, you'll find out that the minimum-f filter skips
anything that would have been pruned anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Pavel Stehule
2010/5/28 alvherre alvhe...@commandprompt.com:
 Excerpts from Peter Eisentraut's message of vie may 28 12:27:52 -0400 2010:

 Option 2, invent some new mechanism that accompanies a datum or a type
 whereever it goes.  Kind of like typmod, but not really.  Then the
 collation information would presumably be made available to functions
 through the fmgr interface.  The binary representation of data values
 stays the same.

 Is the collation a property of the datum, or one of the comparison?
 If the latter, should it be really be made a sidecar of a datum, or
 would it make more sense to attach it to the operation being performed?


 I wonder if instead of trying to pass it down multiple layers till
 bttextcmp and further down, it would make more sense to set a global
 variable somewhere in the high levels, and only have it checked in
 varstr_cmp.


Maybe collation is property of some operation: func call, sort, ... I
prefer to put collation info to FunctionCallInfo structure. Usually
you cannot change collation per row - collation is attached to
expression.

Regards

Pavel
 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] List traffic

2010-05-28 Thread Josh Berkus
On 5/27/10 5:42 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 We do not have a problem.   The lists are fine the way they are.
 
 +1 ... wasn't the point I thought you were trying to make, but I'm
 good with not changing things.

Yeah, that's because I was responding to the suggestion that 5 of our
lists should be collapsed into 'general' as the One Uber-List.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failback with log shipping

2010-05-28 Thread Heikki Linnakangas

On 28/05/10 22:20, Dimitri Fontaine wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

Not shipped before the first failover you mean? No, if any WAL records were
created in the old master that were not shipped to the standby before
failover, the corresponding changes to the data files might've been flushed
to disk already, and you can't undo those by not replaying the WAL record on
restart.


Ah yes you need to fail between when (WAL is written and not sent) and
CHECKPOINT for this to be possible.


Checkpoint only guarantees that everything before that is flushed to 
disk. It doesn't guarantee that nothing is flushed to disk until that. 
If there's a checkpoint that hasn't been shipped to the standby, you're 
certainly hosed, but if there is no checkpoint you don't know if the 
data files have changed or not.



But automatic testing of the
situation (is the data already safe in PGDATA) might still be possible?


Hmm, so the situation is this:

D - E - crash!
  /
A - B - C
  \
d - f - g - h

The letters represent WAL records. C is the last WAL record that was 
shipped to the standby, D  E are WAL records that were generated in the 
old master before the crash but never sent to the standby, and d-h are 
WAL records created in the standby after failover.


I guess you could read the WAL in the old master and compare it with the 
WAL from the standby to figure out where the failover happened (C), and 
then scan all the data pages involved in records D - E, checking that 
the LSNs on the data pages touched by those records are earlier than C. 
That's a bit laborious, and requires knowledge of all different kinds of 
WAL records to figure out which data pages they touch, but seems 
possible in theory.



How easy is it to script that? It seems all we need is the current XID
of the slave at the end of recovery. It should be in the log, maybe it's
easy enough to expose it at the SQL level…


XID doesn't help at all, LSN more likely, but I feel that I don't fully
understand what you're saying.


Sorry I was unclear, I was thinking in terms of recovery.conf file and
either recovery_target_xid or recovery_target_time. The idea being that
if the old-master didn't CHECKPOINT the changes that the slave missed,
then we can do crash recovery and choose to stop before that point, then
apply WALs from the new master.


Ah, I see. No, you don't want to use a recovery target, that would end 
the recovery and start the server. You just need to make sure to use 
WALs from the new master instead of the old one when both exist.



So you're saying controlled failover could possibly skip base backup to
reuse old master as new slave, and I'm asking if by some luck (crash
happened before CHECKPOINT) and some recovery.conf setup we could get to
the same situation in case of hard failure. That would allow completely
automatic switchover / failover with no need to resync.


Yeah, that would be nice. In practice, I think you would get lucky more 
often than not, because whenever you modify and dirty a page, writing a 
WAL record, the usage count on the buffer is incremented and it won't be 
evicted from the buffer cache for a while.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Heikki Linnakangas

On 28/05/10 23:15, Robert Haas wrote:

On Fri, May 28, 2010 at 3:20 PM, Peter Eisentrautpete...@gmx.net  wrote:

On fre, 2010-05-28 at 15:03 -0400, Robert Haas wrote:

I think we need to think of the comparison operators as ternary, and
the COLLATE syntax applied to columns or present in queries as various
ways of setting defaults or explicit overrides for what the third
argument will end up being.


How could this extend to things like isalpha() or upper() that would
need access to ctype information?


Good question.  :-(


Strictly speaking, collation and ctype are two different things. Which 
is a convenient way to evade the question :-).


But you could ask, how would we handle more fine-grained ctype in 
upper() then? Perhaps by adding a second argument for ctype. Similarly 
to to_tsvector([config, ] string), you could explicitly pass the ctype 
as an argument, or leave it out in which case a default is used. It 
wouldn't give you per-column ctype, though.


What does the spec have to say about the ctype used for upper() et al BTW?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] List traffic

2010-05-28 Thread Jaime Casanova
On Fri, May 28, 2010 at 3:44 PM, Josh Berkus j...@agliodbs.com wrote:
 On 5/27/10 5:42 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 We do not have a problem.   The lists are fine the way they are.

 +1 ... wasn't the point I thought you were trying to make, but I'm
 good with not changing things.

 Yeah, that's because I was responding to the suggestion that 5 of our
 lists should be collapsed into 'general' as the One Uber-List.


i think not all should be collapsed but at least -novice, IMHO

-- 
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Martijn van Oosterhout
On Fri, May 28, 2010 at 10:32:34PM +0300, Peter Eisentraut wrote:
 On fre, 2010-05-28 at 14:48 -0400, Tom Lane wrote:
   SELECT * FROM test WHERE a COLLATE en  'baz' ORDER BY b COLLATE sv;
  
  That seems fairly bizarre.  What does this mean:
  
  WHERE a COLLATE en  b COLLATE de
  
  ?  If it's an error, why is this not an error
  
  WHERE a COLLATE en  b
  
  if b is marked as COLLATE de in its table?
 
 The way I understand it, a collation derivation can be explicit or
 implicit.  Explicit derivations override implicit derivations.  If in
 the argument set of an operation, explicit collation derivations exist,
 they must all be the same.

The SQL standard has an explicit set of rules for determining the
collations of any particular operation (they apply to
operators/functions not to the datums).

The basic idea is that tables/columns/data types define an implicit
collation, which can be overidden by explicit collations. If there is
ambiguity you throw an error. I implemented this all several years ago,
it's not all that complicated really. IIRC I added a field to the Node type
and each level determined it's collection from the sublevels.

I solved the problem for the OP by providing an extra function to user
defined functions which would return the collation for that particular
call.

The more interesting question I found was that the standard only
defined collation for strings, whereas it can be applied much more
broadly. I described a possible solution several years back, it should
in the archives somewhere. It worked pretty well as I recall.

IIRC The idea was to let each type has its own set of collations and
when using an operator/function you let the collection be determined
by the argument that had the same type as the return type.

It would be nice if COLLATE could finally be implemented, it'd be quite
useful.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-28 Thread Jan Urbański
On 28/05/10 22:22, Tom Lane wrote:
 The idea that I was toying with is to assume a Zipfian distribution of
 the input (with some reasonable parameter), and use that to estimate
 what the frequency of the K'th element will be, where K is the target
 number of MCV entries or perhaps a bit more.  Then use that estimate as
 the s value, and set e = s/10 or so, and then w = 1/e and continue as
 per the paper.  If the eventual filtering results in a lot less than the
 target number of MCV entries (because the input wasn't so Zipfian), we
 lose, but at least we have accurate numbers for the entries we kept.

I see what you mean, so the idea would be:

 * assume some value of W as the number of all words in the language
 * estimate s as 1/(st + 10)*H(W), where H(W) is the W'th harmonic
number and st is the statistics target, using Zipf's law
 * set e = s/10 and w = 1/e, that is 10/s
 * perform LC using that value of w
 * remove all elements for which f  (s-e)N, that is f  0.9*sN, where N
is the total number of lexemes processed
 * create the MCELEM entries as (item, f/N)

Now I tried to substitute some numbers there, and so assuming the
English language has ~1e6 words H(W) is around 6.5. Let's assume the
statistics target to be 100.

I chose s as 1/(st + 10)*H(W) because the top 10 English words will most
probably be stopwords, so we will never see them in the input.

Using the above estimate s ends up being 6.5/(100 + 10) = 0.06

We then do LC, pruning the D structure every w = 1/0.006 = 167 lexemes

After that, we remove lexemes with f  0.9 * 0.06 * N = 0.054*N

So assuming that on average a tsvector has 154 elements and that we went
through 35017 rows (as it would be in Jesper's case, before he raised
the stats target from 100 to 1000), we will remove lexemes with f 
0.054 * 35017 * 154 that is f  291201.37

I wonder what would happen if Jasper's case if we did that... And I
wonder how sound that maths is.

 I we should definitely prune the table one last time in the very
 probable case that the loop ended in the middle of two regularly
 happening prunes.
 
 The paper doesn't say that you need to do that.  I suspect if you work
 through the math, you'll find out that the minimum-f filter skips
 anything that would have been pruned anyway.

Possible.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql's is_select_command is naive

2010-05-28 Thread Bruce Momjian
Robert Haas wrote:
 On Wed, May 26, 2010 at 10:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  It knows that queries beginning with select or values are select
  commands, but it seems not to be clued in about table and with.
 
  What we really ought to do IMO is throw out the entire current
  implementation of fetch_count. ?If libpq exposed access to the
  protocol-level fetch count, we could implement it without this
  cursor kluge.
 
 I suspect that would make a lot of people very happy.

I have added the following TODO:

Fix FETCH_COUNT to handle SELECT ... INTO and WITH queries
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01565.php
* http://archives.postgresql.org/pgsql-bugs/2010-05/msg00192.php

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clearing psql's input buffer after auto-reconnect

2010-05-28 Thread Bruce Momjian
Tom Lane wrote:
 We determined that $SUBJECT would be a good idea in this thread:
 http://archives.postgresql.org/pgsql-bugs/2010-05/msg00159.php
 
 I looked a bit at what it would take to make this happen.  The
 difficulty is that the input buffer is a local variable in MainLoop(),
 and so are a bunch of other subsidiary variables that would need to be
 reset along with it.  The place where auto-reconnect presently happens
 is CheckConnection(), which is in a different file and is also several
 levels of subroutine call away from MainLoop.  AFAICS there are three
 ways that we might attack this:
 
 1. Massive restructuring of the code in common.c so that the fact of
 a connection reset having happened can be returned back to MainLoop.
 
 2. Export much of MainLoop's internal state as globals, so that
 CheckConnection can hack on it directly.
 
 3. Have CheckConnection do longjmp(sigint_interrupt_jmp) after resetting
 the connection, to force control to go back to MainLoop directly.
 MainLoop is already coded to clear its local state after catching a
 longjmp.
 
 Now #1 might be the best long-term solution but I have no particular
 appetite to tackle it, and #2 is just too ugly to contemplate.  That
 leaves #3, which is a bit ugly in its own right but seems like the best
 fix we're likely to get.
 
 Comments, better ideas?

Added to TODO:

Prevent psql from sending remaining single-line multi-statement queries
after reconnection

* http://archives.postgresql.org/pgsql-bugs/2010-05/msg00159.php
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01283.php 

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-28 Thread Bruce Momjian
Jan Wieck wrote:
  Reading the entire WAL just to find all COMMIT records, then go back to 
  the origin database to get the actual replication log you're looking for 
  is simpler and more efficient? I don't think so.
  
  Agreed, but I think I've not explained myself well enough.
  
  I proposed two completely separate ideas; the first one was this:
  
  If you must get commit order, get it from WAL on *origin*, using exact
  same code that current WALSender provides, plus some logic to read
  through the WAL records and extract commit/aborts. That seems much
  simpler than the proposal you outlined and as SR shows, its low latency
  as well since commits write to WAL. No need to generate event ticks
  either, just use XLogRecPtrs as WALSender already does.
  
  I see no problem with integrating that into core, technically or
  philosophically.
  
 
 Which means that if I want to allow a consumer of that commit order data 
 to go offline for three days or so to replicate the 5 requested, low 
 volume tables, the origin needs to hang on to the entire WAL log from 
 all 100 other high volume tables?

I suggest writing an external tool that strips out what you need that
can be run at any time, rather than creating a new data format and
overhead for this usecase.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-28 Thread Robert Haas

On May 28, 2010, at 7:19 PM, Bruce Momjian br...@momjian.us wrote:

Jan Wieck wrote:
Reading the entire WAL just to find all COMMIT records, then go  
back to
the origin database to get the actual replication log you're  
looking for

is simpler and more efficient? I don't think so.


Agreed, but I think I've not explained myself well enough.

I proposed two completely separate ideas; the first one was this:

If you must get commit order, get it from WAL on *origin*, using  
exact

same code that current WALSender provides, plus some logic to read
through the WAL records and extract commit/aborts. That seems much
simpler than the proposal you outlined and as SR shows, its low  
latency

as well since commits write to WAL. No need to generate event ticks
either, just use XLogRecPtrs as WALSender already does.

I see no problem with integrating that into core, technically or
philosophically.



Which means that if I want to allow a consumer of that commit order  
data

to go offline for three days or so to replicate the 5 requested, low
volume tables, the origin needs to hang on to the entire WAL log from
all 100 other high volume tables?


I suggest writing an external tool that strips out what you need that
can be run at any time, rather than creating a new data format and
overhead for this usecase.


That would be FAR more complex, less robust, and less performant -  
whereas doing what Jan has proposed is pretty straightforward and  
should have minimal impact on performance - or none when not enabled.


...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small exclusion constraints patch

2010-05-28 Thread Bruce Momjian
Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
  Currently, the check for exclusion constraints performs a sanity check
  that's slightly too strict -- it assumes that a tuple will conflict with
  itself. That is not always the case: the operator might be , in
  which case it's perfectly valid for the search for conflicts to not find
  itself.
 
  This patch simply removes that sanity check, and leaves a comment in
  place.
 
 I'm a bit uncomfortable with removing the sanity check; it seems like a
 good thing to have, especially since this code hasn't even made it out
 of beta yet.  AFAIK the  case is purely hypothetical, because we
 have no index opclasses supporting such an operator, no?  How about just
 documenting that we'd need to remove the sanity check if we ever did add
 support for such a case?

Done, with attached, applied patch.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com
Index: src/backend/executor/execUtils.c
===
RCS file: /cvsroot/pgsql/src/backend/executor/execUtils.c,v
retrieving revision 1.171
diff -c -c -r1.171 execUtils.c
*** src/backend/executor/execUtils.c	26 Feb 2010 02:00:41 -	1.171
--- src/backend/executor/execUtils.c	29 May 2010 02:30:23 -
***
*** 1310,1316 
  
  	/*
  	 * We should have found our tuple in the index, unless we exited the loop
! 	 * early because of conflict.  Complain if not.
  	 */
  	if (!found_self  !conflict)
  		ereport(ERROR,
--- 1310,1317 
  
  	/*
  	 * We should have found our tuple in the index, unless we exited the loop
! 	 * early because of conflict.  Complain if not.  If we ever implement
!  * '' index opclasses, this check will fail and will have to be removed.
  	 */
  	if (!found_self  !conflict)
  		ereport(ERROR,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failback with log shipping

2010-05-28 Thread Fujii Masao
On Fri, May 28, 2010 at 7:58 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 At PGCon, several people asked me about restarting an old master as a
 standby after failover has happened. And it wasn't the first time people ask
 me about it, even before 9.0. We have no mention of that in the docs, which
 is a pretty serious oversight. What can we say about it?

 I believe the current official policy is that you have to take a new base
 backup and restore from that. Rsync can be used to speed that up.

 However, someone once asked me for a comment on a script he wrote to do that
 in a smarter way. I forget who and when and how exactly it worked, but it
 seems possible to do safely.

 First of all, you have to shut down the master cleanly for this to work,
 otherwise there can be changes in the old master that never made it to the
 standby.

 Assuming controlled shutdown and that the standby received all WAL from the
 old master before it was promoted, I think you can simply create a
 recovery.conf in the old master's data directory to turn it into a standby
 server, and restart. Am I missing something?

Failover always increments the timeline ID of the old standby (i.e.,
new master).
Can that procedure work around the gap of the timeline ID between servers?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers