Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-12-20 Thread Noah Misch
On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  I created a function that does this in a loop:
 
  HeapTuple t;
 
  CatalogCacheFlushCatalog(ProcedureRelationId);
  t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */);
  if (!HeapTupleIsValid(t))
  elog(ERROR, cache lookup failed for function 42);
  ReleaseSysCache(t);
 
 ... but this performance test seems to me to be entirely misguided,
 because it's testing a situation that isn't going to occur much in the
 field, precisely because the syscache should prevent constant reloads of
 the same syscache entry.

 [ideas for more-realistic tests]

Granted, but I don't hope to reliably measure a change in a macro-benchmark
after seeing a rickety 2% change in a micro-benchmark.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-12-20 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote:
 ... but this performance test seems to me to be entirely misguided,
 because it's testing a situation that isn't going to occur much in the
 field, precisely because the syscache should prevent constant reloads of
 the same syscache entry.

 [ideas for more-realistic tests]

 Granted, but I don't hope to reliably measure a change in a macro-benchmark
 after seeing a rickety 2% change in a micro-benchmark.

No, I'm not sure about that at all.  In particular I think that
CatalogCacheFlushCatalog is pretty expensive and so the snapshot costs
could be a larger part of a more-realistic test.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Christopher Browne
On Tue, Dec 20, 2011 at 8:36 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 19, 2011 at 2:44 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 I was thinking that we would warn when such was found, set hint bits
 as needed, and rewrite with the new CRC.  In the unlikely event that
 it was a torn hint-bit-only page update, it would be a warning about
 something which is a benign side-effect of the OS or hardware crash.

 But that's terrible.  Surely you don't want to tell people:

 WARNING:  Your database is corrupted, or maybe not.  But don't worry,
 I modified the data block so that you won't get this warning again.

 OK, I guess I'm not sure that you don't want to tell people that.  But
 *I* don't!

This seems to be a frequent problem with this whole doing CRCs on pages thing.

It's not evident which problems will be real ones.  And in such
cases, is the answer to turf the database and recover from backup,
because of a single busted page?  For a big database, I'm not sure
that's less scary than the possibility of one page having a
corruption.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pause at end of recovery

2011-12-20 Thread Simon Riggs
On Tue, Dec 20, 2011 at 1:40 PM, Magnus Hagander mag...@hagander.net wrote:
 These days we have pause_at_recovery_target, which lets us pause when
 we reach a PITR target. Is there a particular reason we don't have a
 way to pause at end of recovery if we *didn't* specify a target -
 meaning we let it run until the end of the archived log? While it's
 too late to change the target, I can see a lot of usescases where you
 don't want it to be possible to make changes to the database again
 until it has been properly verified - and keeping it up in readonly
 mode in that case can be quite useful...

Useful for what purpose? It' s possible to deny access in other ways already.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Alvaro Herrera

Excerpts from Christopher Browne's message of mar dic 20 14:12:56 -0300 2011:

 It's not evident which problems will be real ones.  And in such
 cases, is the answer to turf the database and recover from backup,
 because of a single busted page?  For a big database, I'm not sure
 that's less scary than the possibility of one page having a
 corruption.

I don't think the problem is having one page of corruption.  The problem
is *not knowing* that random pages are corrupted, and living in the fear
that they might be.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication timeout units

2011-12-20 Thread Kevin Grittner
Peter Geoghegan pe...@2ndquadrant.com wrote:
 Magnus Hagander mag...@hagander.net wrote:
 from postgresql.conf.sample:

 #replication_timeout = 60s  # in milliseconds; 0 disables

 Seconds or milliseconds? I would suggest we just remove the in
 milliseconds, and instead say timeout for replication
 connections; 0 disables.
 
 +1 from me. That's very confusing.
 
Isn't it providing information on both the granularity and the
default unit if none is specified?  Why is it more confusing here
than statement_timeout or any of the other places this pattern is
followed?
 
-1 from me on removing it *only* here.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 19, 2011 at 2:44 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 I was thinking that we would warn when such was found, set hint
 bits as needed, and rewrite with the new CRC.  In the unlikely
 event that it was a torn hint-bit-only page update, it would be a
 warning about something which is a benign side-effect of the OS
 or hardware crash.
 
 But that's terrible.  Surely you don't want to tell people:
 
 WARNING:  Your database is corrupted, or maybe not.  But don't
 worry, I modified the data block so that you won't get this
 warning again.
 
 OK, I guess I'm not sure that you don't want to tell people that. 
 But *I* don't!
 
Well, I would certainly change that to comply with standard message
style guidelines.  ;-)
 
But the alternatives I've heard so far bother me more.  It sounds
like the most-often suggested alternative is:
 
ERROR (or stronger?):  page checksum failed in relation 999 page 9
DETAIL:  This may not actually affect the validity of any tuples,
since it could be a flipped bit in the checksum itself or dead
space, but we're shutting you down just in case.
HINT:  You won't be able to read anything on this page, even if it
appears to be well-formed, without stopping your database and using
some arcane tool you've never heard of before to examine and
hand-modify the page.  Any query which accesses this table may fail
in the same way.
 
The warning level message will be followed by something more severe
if the page or a needed tuple is mangled in a way that it would not
be used.  I guess the biggest risk here is that there is real damage
to data which doesn't generate a stronger response, and the users
are ignoring warning messages.  I'm not sure what to do about that,
but the above error doesn't seem like the right solution.
 
Assuming we do something about the torn page on hint-bit only
write issue, by moving the hint bits to somewhere else or logging
their writes, what would you suggest is the right thing to do when a
page is read with a checksum which doesn't match page contents?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Kevin Grittner
Alvaro Herrera alvhe...@commandprompt.com wrote:
 Excerpts from Christopher Browne's message of mar dic 20 14:12:56
 -0300 2011:
 
 It's not evident which problems will be real ones.  And in such
 cases, is the answer to turf the database and recover from
 backup, because of a single busted page?  For a big database, I'm
 not sure that's less scary than the possibility of one page
 having a corruption.
 
 I don't think the problem is having one page of corruption.  The
 problem is *not knowing* that random pages are corrupted, and
 living in the fear that they might be.
 
What would you want the server to do when a page with a mismatching
checksum is read?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Andres Freund
On Tuesday, December 20, 2011 06:38:44 PM Kevin Grittner wrote:
 Alvaro Herrera alvhe...@commandprompt.com wrote:
  Excerpts from Christopher Browne's message of mar dic 20 14:12:56
  
  -0300 2011:
  It's not evident which problems will be real ones.  And in such
  cases, is the answer to turf the database and recover from
  backup, because of a single busted page?  For a big database, I'm
  not sure that's less scary than the possibility of one page
  having a corruption.
  
  I don't think the problem is having one page of corruption.  The
  problem is *not knowing* that random pages are corrupted, and
  living in the fear that they might be.
 
 What would you want the server to do when a page with a mismatching
 checksum is read?
Follow the behaviour of zero_damaged_pages.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Aidan Van Dyk
On Tue, Dec 20, 2011 at 12:38 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 I don't think the problem is having one page of corruption.  The
 problem is *not knowing* that random pages are corrupted, and
 living in the fear that they might be.

 What would you want the server to do when a page with a mismatching
 checksum is read?

But that's exactly the problem.  I don't know what I want the server
to do, because I don't know if the page with the checksum mismatch is
one of the 10GB of pages in the page cache that were dirty and poses 0
risk (i.e. hint-bit only changes made it dirty), a page that was
really messed up on the kernel panic that last happened causing this
whole mess, or an even older page that really is giving bitrot...

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On Tuesday, December 20, 2011 06:38:44 PM Kevin Grittner wrote:
 What would you want the server to do when a page with a mismatching
 checksum is read?

 Follow the behaviour of zero_damaged_pages.

Surely not.  Nobody runs with zero_damaged_pages turned on in
production; or at least, nobody with any semblance of a clue.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Andres Freund
On Tuesday, December 20, 2011 07:08:56 PM Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On Tuesday, December 20, 2011 06:38:44 PM Kevin Grittner wrote:
  What would you want the server to do when a page with a mismatching
  checksum is read?
  
  Follow the behaviour of zero_damaged_pages.
 
 Surely not.  Nobody runs with zero_damaged_pages turned on in
 production; or at least, nobody with any semblance of a clue.
Thats my point. There is no automated solution for page errors. So it should 
ERROR (not PANIC) out in normal operation and be fixable via 
zero_damaged_pages.
I personally wouldn't even have a problem making zero_damaged_pages only 
applicable in single backend mode.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Simon Riggs
On Mon, Dec 19, 2011 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote:

 The only sensible way to handle this is to change the page format as
 discussed. IMHO the only sensible way that can happen is if we also
 support an online upgrade feature. I will take on the online upgrade
 feature if others work on the page format issues, but none of this is
 possible for 9.2, ISTM.

I've had another look at this just to make sure.

Doing this for 9.2 will change the page format, causing every user to
do an unload/reload, with no provided mechanism to do that, whether or
not they use this feature.

If we do that, the hints are all in the wrong places, meaning any hint
set will need to change the CRC.

Currently, setting hints can be done while holding a share lock on the
buffer. Preventing that would require us to change the way buffer
manager works to make it take an exclusive lock while writing out,
since a hint would change the CRC and so allowing hints to be set
while we write out would cause invalid CRCs. So we would need to hold
exclusive lock on buffers while we calculate CRCs.

Overall, this will cause a much bigger performance hit than we planned
for. But then we have SSI as an option, so why not this?

So, do we have enough people in the house that are willing to back
this idea, even with a severe performance hit?  Are we willing to
change the page format now, with plans to change it again in the
future? Are we willing to change the page format for a feature many
people will need to disable anyway? Do we have people willing to spend
time measuring the performance in enough cases to allow educated
debate?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-20 Thread Dimitri Fontaine
David E. Wheeler da...@justatheory.com writes:
 holds: it is not a programming language, and one does not need a PL to have
 a JSON data type.

Exactly.  That does not contradict the fact that if you have
pl/ecmascript you already have JSON.  And that we might as well have had
the ecmascript PL for some time now, we just need to check about that.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Andres Freund
On Tuesday, December 20, 2011 06:44:48 PM Simon Riggs wrote:
 Currently, setting hints can be done while holding a share lock on the
 buffer. Preventing that would require us to change the way buffer
 manager works to make it take an exclusive lock while writing out,
 since a hint would change the CRC and so allowing hints to be set
 while we write out would cause invalid CRCs. So we would need to hold
 exclusive lock on buffers while we calculate CRCs.
While hint bits are a problem that specific problem is actually handled by 
copying the buffer onto a separate buffer and calculating the CRC on that copy. 
Given that we already rely on the fact that the flags can be read consistently 
from the individual backends thats fine.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-20 Thread Andres Freund
On Tuesday, December 20, 2011 07:23:43 PM Dimitri Fontaine wrote:
 David E. Wheeler da...@justatheory.com writes:
  holds: it is not a programming language, and one does not need a PL to
  have a JSON data type.
 Exactly.  That does not contradict the fact that if you have
 pl/ecmascript you already have JSON.  And that we might as well have had
 the ecmascript PL for some time now, we just need to check about that.
Not really. You need to be able to evaluate json without it possibly 
executing code. Many js implementations are likely to have such a feature 
though.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-20 Thread Claes Jakobsson
On Dec 20, 2011, at 12:39 AM, David E. Wheeler wrote:
 On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote:
 
 My understanding is that JSON is a subset of ECMAscript
 
 Well, no, JSON is formally “a lightweight data-interchange format.” It’s 
 derived from JavaScript syntax, but it is not a programming language, so I 
 wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript.
 
  http://json.org/

Are people explicitly asking for a) *JSON* datatype or b) a type that lets you 
store arbitrary complex semi-untyped data structures?

if b) then this might get a lot more interesting

Cheers,
Claes
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Jesper Krogh

On 2011-12-20 18:44, Simon Riggs wrote:

On Mon, Dec 19, 2011 at 11:10 AM, Simon Riggssi...@2ndquadrant.com  wrote:


The only sensible way to handle this is to change the page format as
discussed. IMHO the only sensible way that can happen is if we also
support an online upgrade feature. I will take on the online upgrade
feature if others work on the page format issues, but none of this is
possible for 9.2, ISTM.

I've had another look at this just to make sure.

Doing this for 9.2 will change the page format, causing every user to
do an unload/reload, with no provided mechanism to do that, whether or
not they use this feature.


How about only calculating the checksum and setting it in the bgwriter 
just before

flying the buffer off to disk.

Perhaps even let autovacuum do the same if it flushes pages to disk as a 
part

of the process.

If someone comes along and sets a hint bit,changes data, etc.  its only 
job is to clear

the checksum to a meaning telling we dont have a checksum for this page.

Unless the bgwriter becomes bottlenecked by doing it, the impact on 
foreground

work should be fairly limited.


Jesper .. just throwing in random thoughts ..
--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Jesper Krogh

On 2011-12-19 02:55, Greg Stark wrote:

On Sun, Dec 18, 2011 at 7:51 PM, Jesper Kroghjes...@krogh.cc  wrote:

I dont know if it would be seen as a half baked feature.. or similar,
and I dont know if the hint bit problem is solvable at all, but I could
easily imagine checksumming just skipping the hit bit entirely.

That was one approach discussed. The problem is that the hint bits are
currently in each heap tuple header which means the checksum code
would have to know a fair bit about the structure of the page format.
Also the closer people looked the more hint bits kept turning up
because the coding pattern had been copied to other places (the page
header has one, and index pointers have a hint bit indicating that the
target tuple is deleted, etc). And to make matters worse skipping
individual bits in varying places quickly becomes a big consumer of
cpu time since it means injecting logic into each iteration of the
checksum loop to mask out the bits.

I do know it is a valid and really relevant point (the cpu-time spend),
but here in late 2011 it is really a damn irritating limitation, since if
there any resources I have plenty available of in the production environment
then it is cpu-time, just not on the single core currently serving the 
client.


Jesper
--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2011-12-20 Thread Tomas Vondra
On 20.12.2011 11:20, Magnus Hagander wrote:
 2011/12/20 Tomas Vondra t...@fuzzy.cz:

 I haven't updated the docs yet - let's see if the patch is acceptable at
 all first.
 
 Again, without having reviewed the code, this looks like a feature
 we'd want, so please add some docs, and then submit it for the next
 commitfest!

I've added the docs (see the attachment) and rebased to current head.

Tomas
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a12a9a2..3635c3f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -691,6 +691,26 @@ postgres: replaceableuser/ replaceabledatabase/ 
replaceablehost/ re
  /row
 
  row
+  
entryliteralfunctionpg_stat_get_db_temp_files/function(typeoid/type)/literal/entry
+  entrytypebigint/type/entry
+  entry
+   Nuber of temporary files written for the database. All temporary files 
are
+   counted, regardless of why the temporary file was created (sorting or 
hash
+   join) or file size (log_temp_file does not affect this).
+  /entry
+ /row
+
+ row
+  
entryliteralfunctionpg_stat_get_db_temp_bytes/function(typeoid/type)/literal/entry
+  entrytypebigint/type/entry
+  entry
+   Amount of data written to temporary files for the database. All 
temporary
+   files are counted, regardless of why the temporary file was created 
(sorting
+   or hash join) or file size (log_temp_file does not affect this).
+  /entry
+ /row
+
+ row
   
entryliteralfunctionpg_stat_get_numscans/function(typeoid/type)/literal/entry
   entrytypebigint/type/entry
   entry
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 2253ca8..55d20dc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -574,6 +574,8 @@ CREATE VIEW pg_stat_database AS
 pg_stat_get_db_tuples_updated(D.oid) AS tup_updated,
 pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted,
 pg_stat_get_db_conflict_all(D.oid) AS conflicts,
+pg_stat_get_db_temp_files(D.oid) AS temp_files,
+pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes,
 pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
 FROM pg_database D;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 24f4cde..97c7004 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -286,7 +286,7 @@ static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, 
int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
 static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int 
len);
-
+static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
 /* 
  * Public functions called from postmaster follow
@@ -1339,6 +1339,29 @@ pgstat_report_recovery_conflict(int reason)
pgstat_send(msg, sizeof(msg));
 }
 
+
+/* 
+ * pgstat_report_tempfile() -
+ *
+ * Tell the collector about a temporary file.
+ * 
+ */
+void
+pgstat_report_tempfile(size_t filesize)
+{
+   PgStat_MsgTempFile msg;
+
+   if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+   return;
+
+   pgstat_setheader(msg.m_hdr, PGSTAT_MTYPE_TEMPFILE);
+   msg.m_databaseid = MyDatabaseId;
+   msg.m_filesize = filesize;
+   pgstat_send(msg, sizeof(msg));
+}
+
+;
+
 /* --
  * pgstat_ping() -
  *
@@ -3185,6 +3208,10 @@ PgstatCollectorMain(int argc, char *argv[])

pgstat_recv_recoveryconflict((PgStat_MsgRecoveryConflict *) msg, len);
break;
 
+   case PGSTAT_MTYPE_TEMPFILE:
+   
pgstat_recv_tempfile((PgStat_MsgTempFile *) msg, len);
+   break;
+
default:
break;
}
@@ -3266,6 +3293,8 @@ pgstat_get_db_entry(Oid databaseid, bool create)
result-n_conflict_snapshot = 0;
result-n_conflict_bufferpin = 0;
result-n_conflict_startup_deadlock = 0;
+   result-n_temp_files = 0;
+   result-n_temp_bytes = 0;
 
result-stat_reset_timestamp = GetCurrentTimestamp();
 
@@ -4177,6 +4206,8 @@ pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int 
len)
dbentry-n_tuples_updated = 0;
dbentry-n_tuples_deleted = 0;
dbentry-last_autovac_time = 0;
+   dbentry-n_temp_bytes = 0;
+   dbentry-n_temp_files = 0;
 
dbentry-stat_reset_timestamp = GetCurrentTimestamp();
 
@@ -4403,6 +4434,24 @@ pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict 
*msg, int len)
 }
 
 /* 

[HACKERS] deferrable triggers

2011-12-20 Thread Kevin Grittner
In a discussion on irc today, someone had a need to confirm that a
business rule (this table has two rows for every related one row in
another table) was true at commit time.  I innocently suggested a
deferrable (and deferred) trigger.  It was pointed out that the
docs:
 
http://www.postgresql.org/docs/9.1/interactive/sql-createtrigger.html#AEN68703
 
say:
 
| This can only be specified for constraint triggers.
 
Hmm.  Somehow I had gotten hold of deferrable triggers as a way to
solve a problem in moving our replication from our Java framework to
PostgreSQL triggers.  So we are using a hand-written DEFERRABLE
trigger in production, with it apparently working as intended.
 
Is this dangerous?  If not, should the docs be updated?
 
This is a potentially valuable feature.  For example, to ensure that
a double-entry accounting system leaves a financial transaction
balanced when a transaction commits.  There is no way to check that
with EACH ROW triggers, and it can be very clumsy to structure
things so that each single statement moves things from one valid
state to another.  (That *is* one of the jobs of a transaction,
after all.)
 
If we remove the above-referenced sentence from the docs, should we
include some warnings about the memory needed to track the need to
fire these triggers?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-20 Thread Christopher Browne
On Mon, Dec 19, 2011 at 5:49 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 My understanding is that JSON is a subset of ECMAscript, so if you get
 the latter you already have the former.  Now, someone would have to
 check if plscheme still build with guile-2.0, and given that, how
 exactly you get pl/ecmascript (or pl/js) out of that.

I don't think so.

I checked it out (still on pgfoundry, still on CVS, and code hasn't
been touched since 2008), and run into some issues.

- It looks for libguile.h
  #include libguile.h
which, on 2.0, has shifted around from /usr/include/libguile.h (1.8)
to /usr/include/guile/2.0/libguile.h

It's not doing enough indirections internally; there is a guile-config
that is analogous to pg_config

postgres@cbbrowne [03:48:43] [~/PostgreSQL/plscheme]
- % guile-config compile
-pthread -I/usr/include/guile/2.0
postgres@cbbrowne [03:48:45] [~/PostgreSQL/plscheme]
- % guile-config link
-lguile-2.0 -lgc

It looks like there's something PG-related as a next issue:

- % ./install.sh
pg_config : /var/lib/postgresql/dbs/postgresql-HEAD/bin/pg_config
module-dir: /var/lib/postgresql/dbs/postgresql-HEAD/lib
max-cache-size: 64
dbname: postgres
safe-r5rs : NO
dbacreate : NO
PSQL  : /var/lib/postgresql/dbs/postgresql-HEAD/bin/psql  postgres
CPPFLAGS  : -g -Wall -fpic -c
-I/var/lib/postgresql/dbs/postgresql-HEAD/include/server
-I/usr/include/guile/2.0
LDFLAGS   : -shared -lguile

Compiling... failed!

plscheme.c: In function '_PG_init':
plscheme.c:647:2: warning: implicit declaration of function
'DefineCustomStringVariable' [-Wimplicit-function-declaration]
plscheme.c:650:30: error: 'PGC_BACKEND' undeclared (first use in this function)
plscheme.c:650:30: note: each undeclared identifier is reported only
once for each function it appears in
plscheme.c:652:2: warning: implicit declaration of function
'DefineCustomIntVariable' [-Wimplicit-function-declaration]
plscheme.c: In function 'plscheme_func_handler':
plscheme.c:742:2: warning: implicit declaration of function
'GetTopTransactionId' [-Wimplicit-function-declaration]
plscheme.c: In function 'parse_trig_args':
plscheme.c:1623:44: error: dereferencing pointer to incomplete type
plscheme.c:1628:38: error: dereferencing pointer to incomplete type
...
(See error.log for details.)

I'm not sure to what degree this is bitrot relating to:
a) Postgres changes
b) Guile changes
but there's doubtless a bit of both.

I'd think it interesting to get this back to working order, whether
it's useful for JavaScript or not.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deferrable triggers

2011-12-20 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 In a discussion on irc today, someone had a need to confirm that a
 business rule (this table has two rows for every related one row in
 another table) was true at commit time.  I innocently suggested a
 deferrable (and deferred) trigger.  It was pointed out that the
 docs:
 
 http://www.postgresql.org/docs/9.1/interactive/sql-createtrigger.html#AEN68703
 
 say:
 
 | This can only be specified for constraint triggers.
 
 Hmm.  Somehow I had gotten hold of deferrable triggers as a way to
 solve a problem in moving our replication from our Java framework to
 PostgreSQL triggers.  So we are using a hand-written DEFERRABLE
 trigger in production, with it apparently working as intended.

What do you mean by hand-written DEFERRABLE trigger?  AFAICS from the
grammar, DEFERRABLE and related attributes can only be specified when
you write CREATE CONSTRAINT TRIGGER, so the documentation's statement
appears correct to me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sorting table columns

2011-12-20 Thread Tom Lane
Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 I've been trying to implement the holy grail of decoupling
 logical/physical column sort order representation, i.e., the feature
 that lets the server have one physical order, for storage compactness,
 and a different output order that can be tweaked by the user.  This
 has been discussed many times; most recently, I believe, here:
 http://archives.postgresql.org/pgsql-hackers/2007-02/msg01235.php
 with implementation details here:
 http://archives.postgresql.org/pgsql-hackers/2006-12/msg00983.php

 The idea described there by Tom, and upon which I formed a vague
 implementation plan in my head, is that I was to look for all uses of
 an attnum, and then replace it by either attlognum (i.e. the
 user-visible sort identifier) or attphysnum (i.e. the order of
 attributes as stored on disk).

I thought we'd concluded that we really need three values: attnum should
be a permanent logical ID for each column, and then the user-visible
column order would be determined by a different number, and the on-disk
column order by a third.  If we're going to do this at all, it seems
like a seriously bad idea to only go halfway, because then we'll just
have to revisit all the same code again later.

You do *not* want to store either of the latter two numbers in
parse-time Var nodes, because then you can't rearrange columns without
having to update stored rules.  But it might be useful to decree that
one thing setrefs.c does is renumber Vars in scan nodes to use the
physical column numbers instead of the permanent IDs.

I haven't looked into any of the details, but I would guess that
targetlists should always be constructed in logical (user-visible)
column order.  TupleDescs need to match the physical order, most
likely.  Note that all three orderings are always going to be the same
everywhere above the table scan level.  (And I suppose COPY will need
some hack or other.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sorting table columns

2011-12-20 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mar dic 20 18:24:29 -0300 2011:
 Alvaro Herrera alvhe...@alvh.no-ip.org writes:
  I've been trying to implement the holy grail of decoupling
  logical/physical column sort order representation, i.e., the feature
  that lets the server have one physical order, for storage compactness,
  and a different output order that can be tweaked by the user.  This
  has been discussed many times; most recently, I believe, here:
  http://archives.postgresql.org/pgsql-hackers/2007-02/msg01235.php
  with implementation details here:
  http://archives.postgresql.org/pgsql-hackers/2006-12/msg00983.php
 
  The idea described there by Tom, and upon which I formed a vague
  implementation plan in my head, is that I was to look for all uses of
  an attnum, and then replace it by either attlognum (i.e. the
  user-visible sort identifier) or attphysnum (i.e. the order of
  attributes as stored on disk).
 
 I thought we'd concluded that we really need three values: attnum should
 be a permanent logical ID for each column, and then the user-visible
 column order would be determined by a different number, and the on-disk
 column order by a third.  If we're going to do this at all, it seems
 like a seriously bad idea to only go halfway, because then we'll just
 have to revisit all the same code again later.

Yeah, I was unclear -- that's what I'm doing (or, rather, attempting to
do).

 You do *not* want to store either of the latter two numbers in
 parse-time Var nodes, because then you can't rearrange columns without
 having to update stored rules.  But it might be useful to decree that
 one thing setrefs.c does is renumber Vars in scan nodes to use the
 physical column numbers instead of the permanent IDs.

Hmm, having the numbers in Var nodes seems a fundamental part of the way
I'm attacking the problem.  Hopefully after I give setrefs.c a read I
will have a clearer picture of the way to do it without that.

 I haven't looked into any of the details, but I would guess that
 targetlists should always be constructed in logical (user-visible)
 column order.  TupleDescs need to match the physical order, most
 likely.  Note that all three orderings are always going to be the same
 everywhere above the table scan level.  (And I suppose COPY will need
 some hack or other.)

Okay.  AFAICS this shoots down the idea of modifying destreceivers,
which is good because I was coming to that conclusion for a different
reason.

Thanks for the pointers.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deferrable triggers

2011-12-20 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 In a discussion on irc today, someone had a need to confirm that
 a business rule (this table has two rows for every related one
 row in another table) was true at commit time.  I innocently
 suggested a deferrable (and deferred) trigger.  It was pointed
 out that the docs:
 
http://www.postgresql.org/docs/9.1/interactive/sql-createtrigger.html#AEN68703
 
 say:
 
 | This can only be specified for constraint triggers.
 
 Hmm.  Somehow I had gotten hold of deferrable triggers as a way
 to solve a problem in moving our replication from our Java
 framework to PostgreSQL triggers.  So we are using a hand-written
 DEFERRABLE trigger in production, with it apparently working as
 intended.
 
 What do you mean by hand-written DEFERRABLE trigger?  AFAICS
 from the grammar, DEFERRABLE and related attributes can only be
 specified when you write CREATE CONSTRAINT TRIGGER, so the
 documentation's statement appears correct to me.
 
Ah, I had forgotten that I had to use the CONSTRAINT keyword in the
trigger definition; the sentence in the docs makes more sense now.
 
I wrote a plpgsql trigger function and created a deferrable
initially deferred constraint trigger which referenced it.  Is that
a reasonable thing to do if you need a commit-time trigger based on
some particular action against a particular table?  Would it be a
reasonable way for the person on irc to enforce the business rule
mentioned above?
 
If so, I think there's room for the docs to clarify that CONSTRAINT
TRIGGERs are usable for things other than implementing declarative
constraints, which was the (apparently contagious) interpretation of
the person in irc.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] deferrable triggers

2011-12-20 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 What do you mean by hand-written DEFERRABLE trigger?
 
 Ah, I had forgotten that I had to use the CONSTRAINT keyword in the
 trigger definition; the sentence in the docs makes more sense now.
 
 I wrote a plpgsql trigger function and created a deferrable
 initially deferred constraint trigger which referenced it.  Is that
 a reasonable thing to do if you need a commit-time trigger based on
 some particular action against a particular table?  Would it be a
 reasonable way for the person on irc to enforce the business rule
 mentioned above?

Sure.  The reason we decided to document CREATE CONSTRAINT TRIGGER was
exactly that people sometimes need to use it for things besides foreign
keys.  However, it's not a commit time trigger exactly -- keep in mind
that SET CONSTRAINTS can override the trigger's own timing
specification.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes

2011-12-20 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 On Tue, Dec 20, 2011 at 07:08, Tom Lane t...@sss.pgh.pa.us wrote:
 but I think I don't
 like this refactoring much.  Will take a closer look tomorrow.

 I was afraid you'd say that, especially for a change that should be
 backpatched. But I couldn't think of alternative ways to do it that
 give non-bogus estimates.

I've applied a revised version of this patch that factors things in a
way I found nicer.  The main concrete thing I didn't like about what
you'd done was dropping the haveFullScan logic.  If we have more than
one qual triggering that, we're still going to do one full scan, not
multiples of that.  It seemed unreasonably hard to get that exactly
right when there are multiple array quals each doing such a thing,
but I didn't want to let it regress in its handling of multiple
plain quals.

Also, while looking at this I realized that we had the costing of
nestloop cases all wrong.  The idea is to scale up the number of
tuples (pages) fetched, apply index_pages_fetched(), then scale
down again.  I think maybe somebody thought that was redundant,
but it's not because index_pages_fetched() is nonlinear.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RangeVarGetRelid()

2011-12-20 Thread Noah Misch
On Mon, Dec 19, 2011 at 11:52:54PM -0500, Robert Haas wrote:
 After staring at this for quite a while longer, it seemed to me that
 the logic for renaming a relation was similar enough to the logic for
 changing a schema that the two calbacks could reasonably be combined
 using a bit of conditional logic; and that, further, the same callback
 could be used, with a small amount of additional modification, for
 ALTER TABLE.  Here's a patch to do that.

Nice.

 I also notice that cluster() - which doesn't have a callback - has
 exactly the same needs as ReindexRelation() - which does.  So that
 case can certainly share code; though I'm not quite sure what to call
 the shared callback, or which file to put it in.
 RangeVarCallbackForStorageRewrite?

I'd put it in tablecmds.c and name it RangeVarCallbackOwnsTable.


A few things on the patch:

 --- a/src/backend/commands/tablecmds.c
 +++ b/src/backend/commands/tablecmds.c

 @@ -2560,90 +2500,26 @@ CheckTableNotInUse(Relation rel, const char *stmt)
   * Thanks to the magic of MVCC, an error anywhere along the way rolls back
   * the whole operation; we don't have to do anything special to clean up.
   *
 - * We lock the table as the first action, with an appropriate lock level
 + * The caller must lock the relation, with an appropriate lock level 
   * for the subcommands requested. Any subcommand that needs to rewrite
   * tuples in the table forces the whole command to be executed with
 - * AccessExclusiveLock. If all subcommands do not require rewrite table
 - * then we may be able to use lower lock levels. We pass the lock level down
 + * AccessExclusiveLock (actually, that is currently required always, but
 + * we hope to relax it at some point).  We pass the lock level down
   * so that we can apply it recursively to inherited tables. Note that the
 - * lock level we want as we recurse may well be higher than required for
 + * lock level we want as we recurse might well be higher than required for
   * that specific subcommand. So we pass down the overall lock requirement,
   * rather than reassess it at lower levels.
   */
  void
 -AlterTable(AlterTableStmt *stmt)
 +AlterTable(Oid relid, LOCKMODE lockmode, AlterTableStmt *stmt)
  {
   Relationrel;
 - LOCKMODElockmode = AlterTableGetLockLevel(stmt-cmds);
  
 - /*
 -  * Acquire same level of lock as already acquired during parsing.
 -  */
 - rel = relation_openrv(stmt-relation, lockmode);
 + /* Caller is required to provide an adequate lock. */
 + rel = relation_open(relid, NoLock);
  
   CheckTableNotInUse(rel, ALTER TABLE);
  
 - /* Check relation type against type specified in the ALTER command */
 - switch (stmt-relkind)
 - {
 - case OBJECT_TABLE:
 -
 - /*
 -  * For mostly-historical reasons, we allow ALTER TABLE 
 to apply to
 -  * almost all relation types.
 -  */
 - if (rel-rd_rel-relkind == RELKIND_COMPOSITE_TYPE
 - || rel-rd_rel-relkind == 
 RELKIND_FOREIGN_TABLE)
 - ereport(ERROR,
 - 
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
 -  errmsg(\%s\ is not a table,
 - 
 RelationGetRelationName(rel;

RangeVarCallbackForAlterRelation() does not preserve ALTER TABLE's refusal to
operate on foreign tables.

 - break;
 -
 - case OBJECT_INDEX:
 - if (rel-rd_rel-relkind != RELKIND_INDEX)
 - ereport(ERROR,
 - 
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
 -  errmsg(\%s\ is not an 
 index,
 - 
 RelationGetRelationName(rel;
 - break;
 -
 - case OBJECT_SEQUENCE:
 - if (rel-rd_rel-relkind != RELKIND_SEQUENCE)
 - ereport(ERROR,
 - 
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
 -  errmsg(\%s\ is not a 
 sequence,
 - 
 RelationGetRelationName(rel;
 - break;
 -
 - case OBJECT_TYPE:
 - if (rel-rd_rel-relkind != RELKIND_COMPOSITE_TYPE)
 - ereport(ERROR,
 - 
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
 -  errmsg(\%s\ is not a 
 composite type,
 - 
 RelationGetRelationName(rel;
 - break;
 -
 - case OBJECT_VIEW:
 - if (rel-rd_rel-relkind 

Re: [HACKERS] sorting table columns

2011-12-20 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of mar dic 20 18:24:29 -0300 2011:
 You do *not* want to store either of the latter two numbers in
 parse-time Var nodes, because then you can't rearrange columns without
 having to update stored rules.  But it might be useful to decree that
 one thing setrefs.c does is renumber Vars in scan nodes to use the
 physical column numbers instead of the permanent IDs.

 Hmm, having the numbers in Var nodes seems a fundamental part of the way
 I'm attacking the problem.  Hopefully after I give setrefs.c a read I
 will have a clearer picture of the way to do it without that.

To clarify a bit: one thing that setrefs.c already does is to renumber
Var nodes above the scan level, so that their attnums refer not to
original table column attnums but to column numbers in the output of the
next plan level down.  Vars in scan nodes currently don't need any
renumbering, but it'd be easy enough to extend the logic to do something
to them as well.  I'm visualizing the run-time transformation from
physical to logical column ordering as a sort of projection, much like
the mapping that happens in a join node.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-20 Thread David E. Wheeler
On Dec 19, 2011, at 9:00 PM, Robert Haas wrote:

 +1, though I think the core type will at least need some basic operators and 
 indexing support.
 
 And I'm willing to do that, but I thought it best to submit a bare
 bones patch first, in the hopes of minimizing the number of
 objectionable things therein.  For example, if you want to be able to
 index a JSON column, you have to decide on some collation order that
 is consistent with JSON's notion of equality, and it's not obvious
 what is most logical.  Heck, equality itself isn't 100% obvious.  If
 there's adequate support for including JSON in core, and nobody
 objects to my implementation, then I'll throw some ideas for those
 things up against the wall and see what sticks.

+1 Sounds good to me.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-20 Thread David E. Wheeler
On Dec 20, 2011, at 2:13 AM, Magnus Hagander wrote:

 Yes, that's why I said few not none.
 
 Though in my experience, most companies are a lot more restrictive
 about addons to their database than addons to their development
 environments.

Yeah, we’re getting off-topic here, so I’ll just say something we can agree on: 
We’ll see.

I do still want to see some processes for getting PGXN distributions into 
RPM/.deb/StackBuilder, though.

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-20 Thread David E. Wheeler
On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote:

 Are people explicitly asking for a) *JSON* datatype or b) a type that lets 
 you store arbitrary complex semi-untyped data structures?

Yes.

 if b) then this might get a lot more interesting

JSON is the most popular/likely way to represent that, I think.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need create tuplestore for each fetch?

2011-12-20 Thread 高增琦
Thanks for you reply.

I found query without cursor is faster then query with server-side cursor
and several fetches.
But I have a large result set to retrieve from database. I have to choose
server-side cursor
to avoid out-of-memory problem.

When I try to debug the cursor and fetch, I found this unexpected behavior.
I think maybe
the tuplestore slows the cursor. (maybe I should do some profile later)

I want to change the code, but I am afraid there are important reasons for
the tuplestore.
Therefore, I post it to this list for help: why create tuplestore for each
fetch?

p.s. a large fetch may turn tuplestore to use buffer file, and slow the
performance very much.

On Mon, Dec 19, 2011 at 9:06 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Dec 15, 2011 at 8:30 AM, 高增琦 pgf...@gmail.com wrote:
  I found this several days ago when I try to debug a fetch of cursor.
  And I have sent a mail to this list, but no one reply...
  Maybe this is a very simple problem, please help me, thanks a lot...
 
  Here is the example:
  create table t (a int);
  insert into t values (1),(3),(5),(7),(9);
  insert into t select a+1 from t;
  begin;
  declare c cursor for select * from t order by a;
  fetch 3 in c;
  fetch 3 in c;
  fetch 3 in c;
 
  In 'PortalRun', a fetch stmt will be treated with PORTAL_UTIL_SELECT,
  and then a tuplestore will be created in 'FillPortalStore' in the
  fetch stmt's portal.
 
  In 'FillPortalStore', all result will be store at that tuplestore,
  Then, go back to 'PortalRun'; next,  'PortalRunSelect' will send this
  results to client...
 
  My problem is: why do we need create that tuplestore as an
  middle storeage? why do not we just send these result to clent
  at the first time?

 Good question.  I wouldn't expect it to matter very much for a
 three-row fetch, but maybe it does for larger ones?  What is your
 motivation for investigating this?

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




-- 
GaoZengqi
pgf...@gmail.com
zengqi...@gmail.com


[HACKERS] CLOG contention

2011-12-20 Thread Robert Haas
A few weeks ago I posted some performance results showing that
increasing NUM_CLOG_BUFFERS was improving pgbench performance.

http://archives.postgresql.org/pgsql-hackers/2011-12/msg00095.php

I spent some time today looking at this in a bit more detail.
Somewhat obviously in retrospect, it turns out that the problem
becomes more severe the longer you run the test.  CLOG lookups are
induced when we go to update a row that we've previously updated.
When the test first starts, just after pgbench -i, all the rows are
hinted and, even if they weren't, they all have the same XID.  So no
problem.  But, as the fraction of rows that have been updated
increases, it becomes progressively more likely that the next update
will hit a row that's already been updated.  Initially, that's OK,
because we can keep all the CLOG pages of interest in the 8 available
buffers.  But eaten through enough XIDs - specifically, 8 buffers *
8192 bytes/buffer * 4 xids/byte = 256k - we can't keep all the
necessary pages in memory at the same time, and so we have to keep
replacing CLOG pages.  This effect is not difficult to see even on my
2-core laptop, although I'm not sure whether it causes any material
performance degradation.

If you have enough concurrent tasks, a probably-more-serious form of
starvation can occur.  As SlruSelectLRUPage notes:

/*
 * We need to wait for I/O.  Normal case is that it's
dirty and we
 * must initiate a write, but it's possible that the
page is already
 * write-busy, or in the worst case still read-busy.
In those cases
 * we wait for the existing I/O to complete.
 */

On Nate Boley's 32-core box, after running pgbench for a few minutes,
that in the worst case scenario starts happening quite regularly,
apparently because the number of people who simultaneously wish to
read a different CLOG pages exceeds the number of available buffers
into which they can be read.  The ninth and following backends to come
along have to wait until the least-recently-used page is no longer
read-busy before starting their reads.

So, what do we do about this?  The obvious answer is increase
NUM_CLOG_BUFFERS, and I'm not sure that's a bad idea.  64kB is a
pretty small cache on anything other than an embedded system, these
days.  We could either increase the hard-coded value, or make it
configurable - but it would have to be PGC_POSTMASTER, since there's
no way to allocate more shared memory later on.  The downsides of this
approach are:

1. If we make it configurable, nobody will have a clue what value to set.
2. If we just make it bigger, people laboring under the default 32MB
shared memory limit will conceivably suffer even more than they do now
if they just initdb and go.

A more radical approach would be to try to merge the buffer arenas for
the various SLRUs either with each other or with shared_buffers, which
would presumably allow a lot more flexibility to ratchet the number of
CLOG buffers up or down depending on overall memory pressure.  Merging
the buffer arenas into shared_buffers seems like the most flexible
solution, but it also seems like a big, complex, error-prone behavior
change, because the SLRU machinery does things quite differently from
shared_buffers: we look up buffers with a linear array search rather
than a hash table probe; we have only a per-SLRU lock and a per-page
lock, rather than separate mapping locks, content locks,
io-in-progress locks, and pins; and while the main buffer manager is
content with some loosey-goosey approximation of recency, the SLRU
code makes a fervent attempt at strict LRU (slightly compromised for
the sake of reduced locking in SimpleLruReadPage_Readonly).

Any thoughts on what makes most sense here?  I find it fairly tempting
to just crank up NUM_CLOG_BUFFERS and call it good, but the siren song
of refactoring is whispering in my other ear.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need create tuplestore for each fetch?

2011-12-20 Thread Tom Lane
=?UTF-8?B?6auY5aKe55Cm?= pgf...@gmail.com writes:
 Here is the example:
 create table t (a int);
 insert into t values (1),(3),(5),(7),(9);
 insert into t select a+1 from t;
 begin;
 declare c cursor for select * from t order by a;
 fetch 3 in c;
 fetch 3 in c;
 fetch 3 in c;
 
 In 'PortalRun', a fetch stmt will be treated with PORTAL_UTIL_SELECT,
 and then a tuplestore will be created in 'FillPortalStore' in the
 fetch stmt's portal.

How are you trying to do the fetches, PQexec(fetch 3 in c) ?
That is an inherently inefficient way to do things, and trying to shave
a few cycles off the intermediate tuplestore isn't going to fix that.
The general overhead of parsing a new SQL command is probably going to
swamp the costs of a tuplestore, especially if it's too small to spill
to disk (and if it isn't, you really do need the tuplestore mechanism,
slow or not).

If you want to get a speed improvement there would probably be a lot
more bang for the buck in extending libpq to support protocol-level
portal access.  It does already have PQdescribePortal, but for some
reason not anything for fetch N rows from portal so-and-so.  Not
sure whether it's worth providing explicit portal open/close commands
separate from PQexec'ing DECLARE CURSOR and CLOSE, but maybe at the
margins those steps would be worth improving too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2011-12-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 So, what do we do about this?  The obvious answer is increase
 NUM_CLOG_BUFFERS, and I'm not sure that's a bad idea.

As you say, that's likely to hurt people running in small shared
memory.  I too have thought about merging the SLRU areas into the main
shared buffer arena, and likewise have concluded that it is likely to
be way more painful than it's worth.  What I think might be an
appropriate compromise is something similar to what we did for
autotuning wal_buffers: use a fixed percentage of shared_buffers, with
some minimum and maximum limits to ensure sanity.  But picking the
appropriate percentage would take a bit of research.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2011-12-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... while the main buffer manager is
 content with some loosey-goosey approximation of recency, the SLRU
 code makes a fervent attempt at strict LRU (slightly compromised for
 the sake of reduced locking in SimpleLruReadPage_Readonly).

Oh btw, I haven't looked at that code recently, but I have a nasty
feeling that there are parts of it that assume that the number of
buffers it is managing is fairly small.  Cranking up the number
might require more work than just changing the value.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Greg Smith

On 12/19/2011 06:14 PM, Kevin Grittner wrote:

But if you need all that infrastructure just to get the feature
launched, that's a bit hard to stomach.
 


Triggering a vacuum or some hypothetical scrubbing feature?
   


What you were suggesting doesn't require triggering  just a vacuum 
though--it requires triggering some number of vacuums, for all impacted 
relations.  You said yourself that all tables if the there's no way to 
rule any of them out was a possibility.  I'm just pointing out that 
scheduling that level of work is a logistics headache, and it would be 
reasonable for people to expect some help with that were it to become a 
necessary thing falling out of the implementation.



Some people think I border on the paranoid on this issue.


Those people are also out to get you, just like the hardware.


Are you arguing that autovacuum should be disabled after crash
recovery?  I guess if you are arguing that a database VACUUM might
destroy recoverable data when hardware starts to fail, I can't
argue.


A CRC failure suggests to me a significantly higher possibility of 
hardware likely to lead to more corruption than a normal crash does though.



The main way I expect to validate this sort of thing is with an as
yet unwritten function to grab information about a data block from
a standby server for this purpose, something like this:

Master:  Computed CRC A, Stored CRC B; error raised because A!=B
Standby:  Computed CRC C, Stored CRC D

If C==D  A==C, the corruption is probably overwritten bits of
the CRC B.
 


Are you arguing we need *that* infrastructure to get the feature
launched?
   


No; just pointing out the things I'd eventually expect people to want, 
because they help answer questions about what to do when CRC failures 
occur.  The most reasonable answer to what should I do about suspected 
corruption on a page? in most of the production situations I worry 
about is see if it's recoverable from the standby.  I see this as 
being similar to how RAID-1 works:  if you find garbage on one drive, 
and you can get a clean copy of the block from the other one, use that 
to recover the missing data.  If you don't have that capability, you're 
stuck with no clear path forward when a CRC failure happens, as you 
noted downthread.


This obviously gets troublesome if you've recently written a page out, 
so there's some concern about whether you are checking against the 
correct version of the page or not, based on where the standby's replay 
is at.  I see that as being a case that's also possible to recover from 
though, because then the page you're trying to validate on the master is 
likely sitting in the recent WAL stream.  This is already the sort of 
thing companies doing database recovery work (of which we are one) deal 
with, and I doubt any proposal will cover every possible situation.  In 
some cases there may be no better answer than show all the known 
versions and ask the user to sort it out.  The method I suggested would 
sometimes kick out an automatic fix.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-12-20 Thread Alexander Korotkov
Hi!

Studying this question little more I found that current approach of range
indexing can be dramatically inefficient in some cases. It's not because of
penalty or split implementation, but because of approach itself. Mapping
intervals to two-dimensional space produce much better results in case of
high-overlapping ranges and @, @ operators with low selectivity.

There is a simple test case for proof of concept.

create table source as (select l, (l + s) as r from (select
(random()*1)::int as l, (random()*1000 + 1)::int s from
generate_series(1,100) g) x);
create table range_test as (select int4range(l,r) as x from source);
create table point_test as (select point(l,r) as x from source);
create index range_test_idx on range_test using gist (x);
create index point_test_idx on point_test using gist (x);

test=# explain (analyze, buffers) select * from range_test where x @
int4range(5000,5010);
 QUERY PLAN


-
 Bitmap Heap Scan on range_test  (cost=40.31..2585.65 rows=1000 width=32)
(actual time=37.304..37.310 rows=2 loops=1)
   Recheck Cond: (x @ '[5000,5010)'::int4range)
   Buffers: shared hit=767
   -  Bitmap Index Scan on range_test_idx  (cost=0.00..40.06 rows=1000
width=0) (actual time=37.288..37.288 rows=2 loops=1)
 Index Cond: (x @ '[5000,5010)'::int4range)
 Buffers: shared hit=765
 Total runtime: 37.385 ms
(7 rows)


test=# explain (analyze, buffers) select * from point_test where x @
box(point(5000,5000),point(5010,5010));
QUERY PLAN


---
 Bitmap Heap Scan on point_test  (cost=44.36..2589.69 rows=1000 width=16)
(actual time=0.197..0.206 rows=2 loops=1)
   Recheck Cond: (x @ '(5010,5010),(5000,5000)'::box)
   Buffers: shared hit=5
   -  Bitmap Index Scan on point_test_idx  (cost=0.00..44.11 rows=1000
width=0) (actual time=0.182..0.182 rows=2 loops=1)
 Index Cond: (x @ '(5010,5010),(5000,5000)'::box)
 Buffers: shared hit=3
 Total runtime: 0.265 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where x @
int4range(5000,5990);
QUERY PLAN


---
 Bitmap Heap Scan on range_test  (cost=40.31..2585.65 rows=1000 width=32)
(actual time=4.578..4.603
rows=5 loops=1)
   Recheck Cond: (x @ '[5000,5990)'::int4range)
   Buffers: shared hit=52
   -  Bitmap Index Scan on range_test_idx  (cost=0.00..40.06 rows=1000
width=0) (actual time=4.561..4.561 rows=5 loops=1)
 Index Cond: (x @ '[5000,5990)'::int4range)
 Buffers: shared hit=47
 Total runtime: 4.669 ms
(7 rows)


test=# explain (analyze, buffers) select * from point_test where x @
box(point('-inf'::float,5990),point(5000,'+inf'::float));
QUERY PLAN


---
 Bitmap Heap Scan on point_test  (cost=44.36..2589.69 rows=1000 width=16)
(actual time=0.328..0.353 rows=5 loops=1)
   Recheck Cond: (x @ '(5000,inf),(-inf,5990)'::box)
   Buffers: shared hit=8
   -  Bitmap Index Scan on point_test_idx  (cost=0.00..44.11 rows=1000
width=0) (actual time=0.312..0.312 rows=5 loops=1)
 Index Cond: (x @ '(5000,inf),(-inf,5990)'::box)
 Buffers: shared hit=3
 Total runtime: 0.419 ms
(7 rows)

If you like to learn more information about such mapping you can start from
here: http://www.comsis.org/ComSIS/Vol7No4/RegularPapers/paper16.pdf

Any thoughts?

-
With best regards,
Alexander Korotkov.


Re: [HACKERS] JSON for PG 9.2

2011-12-20 Thread Magnus Hagander
On Tue, Dec 20, 2011 at 00:26, David E. Wheeler da...@kineticode.com wrote:
 On Dec 18, 2011, at 4:41 AM, Magnus Hagander wrote:

 We can hopefully get around this for the extensions in contrib (and
 reasonably well has already), but few large companies are going to be
 happy to go to pgxn and download an extension that has a single
 maintainer (not the team, and in most cases not even a team),
 usually no defined lifecycle, no support, etc. (I'm pretty sure you
 won't get support included for random pgxn modules when you buy a
 contract from EDB, or CMD, or us, or PGX, or anybody really - wheras
 if it the datatype is in core, you *will* get this)

 I support having a JSON type in core, but question the assertions here. 
 *Some* organizations won’t use PGXN, usually because they require things 
 through a different ecosystem (RPMs, .debs, StackBuilder, etc.). But many 
 others will. There are a *lot* of companies out there that use CPAN, 
 easy_install, and Gem. The same sorts of places will use PGXN.

Yes, that's why I said few not none.

Though in my experience, most companies are a lot more restrictive
about addons to their database than addons to their development
environments.

And note that it's not PGXN that's the problem I'm pointing at,
neither is it CPAN or easy_install or gem. The problem is the
vulnerability of the addon, and the maintenance. Meaning if it has a
single maintainer, that's a whole different thing from being
maintained by the PGDG.


 Oh, and at PGX, we’ll happily provide support for random modules, so long as 
 you pay for our time. We’re not picky (and happy to send improvements back 
 upstream), though we might recommend you switch to something better. But such 
 evaluations are based on quality, not simply on what ecosystem it came from.

I think we're talking about different things here. While we can
certainly provide support on specific modules, after that is entered
into the agreement with the customer, we won't support a customer who
just calls up and says hey, I'm using module xyz which you've never
heard of, and it crashes my database, please come fix it now. Are you
saying you do that - providing SLAs, 24/7 and similar things, on
modules you didn't even know the customer was using?

And FWIW, I'm talking about the quality, and not the ecosystem as
well. I'm just saying it takes a lot more work to verify the quality
and maintenance of an external module - if it's part of postgresql,
you have *already* got a quality stamp and a maintenance promise from
that.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-20 Thread pratikchirania
Would this be alleviated by setting stats_temp_dir to point to a ramdisk?

I am not aware how to do this. I am using a windows server OS.
The conf file has the entry : #stats_temp_directory = 'pg_stat_tmp'

What do I change it to? Please elucidate.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5088497.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-20 Thread Magnus Hagander
On Tue, Dec 20, 2011 at 06:00, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 19, 2011 at 6:26 PM, David E. Wheeler da...@kineticode.com 
 wrote:
 +1, though I think the core type will at least need some basic operators and 
 indexing support.

 And I'm willing to do that, but I thought it best to submit a bare
 bones patch first, in the hopes of minimizing the number of
 objectionable things therein.  For example, if you want to be able to
 index a JSON column, you have to decide on some collation order that
 is consistent with JSON's notion of equality, and it's not obvious
 what is most logical.  Heck, equality itself isn't 100% obvious.  If
 there's adequate support for including JSON in core, and nobody
 objects to my implementation, then I'll throw some ideas for those
 things up against the wall and see what sticks.

+1 for getting the basics in first, and then adding more to it later.
There's still a fair amount of time to do that for 9.2, but not if we
get stuck bikeshedding again...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2011-12-20 Thread Magnus Hagander
2011/12/20 Tomas Vondra t...@fuzzy.cz:
 Hello everybody,

 this patch adds two counters to pg_stat_database to track temporary
 files - number of temp files and number of bytes. I see this as a useful
 feature, as temporary files often cause a lot of IO (because of low
 work_mem etc.). The log_temp_files is useful, but you have to parse the
 log and only temp files exceeding a size limit are logged so the actual
 amount of I/O is unknown.

Hey, cool, that was on my personal TODO list :-)


 The patch is rather simple:

 1) two new fields in PgStat_StatDBEntry (n_temp_files, n_temp_bytes)
 2) report/recv function in pgstat.c
 3) FileClose modified to log stats for all temp files (see below)
 4) two new fields added to pg_stat_database (temp_files, temp_bytes)

I haven't reviewed the code itself yet, but that seems like a
reasonable approach.


 I had to modify FileClose to call stat() on each temp file as this
 should log all temp files (not just when log_temp_file = 0). But the
 impact of this change should be negligible, considering that the user is
 already using temp files.

 I haven't updated the docs yet - let's see if the patch is acceptable at
 all first.

Again, without having reviewed the code, this looks like a feature
we'd want, so please add some docs, and then submit it for the next
commitfest!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] xlog location arithmetic

2011-12-20 Thread Magnus Hagander
On Tue, Dec 6, 2011 at 19:06, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Dec 6, 2011 at 1:00 PM, Euler Taveira de Oliveira
 eu...@timbira.com wrote:
 On 06-12-2011 13:11, Robert Haas wrote:
 On Tue, Dec 6, 2011 at 5:14 AM, Magnus Hagander mag...@hagander.net wrote:
 I've been considering similar things, as you can find in the archives,
 but what I was thinking of was converting the number to just a plain
 bigint, then letting the user apply whatever arithmetic wanted at the
 SQL level. I never got around to acutally coding it, though. It could
 easily be extracted from your patch of course - and I think that's a
 more flexible approach. Is there some advantage to your method that
 I'm missing?

 I went so far as to put together an lsn data type.  I didn't actually
 get all that far with it, which is why I haven't posted it sooner, but
 here's what I came up with.  It's missing indexing support and stuff,
 but that could be added if people like the approach.  It solves this
 problem by implementing -(lsn,lsn) = numeric (not int8, that can
 overflow since it is not unsigned), which allows an lsn = numeric
 conversion by just subtracting '0/0'::lsn.

 Interesting approach. I don't want to go that far. If so, you want to change
 all of those functions that deal with LSNs and add some implicit conversion
 between text and lsn data types (for backward compatibility). As of int8, I'm

As long as you have the conversion, you don't really need to change
them, do you? It might be nice in some ways, but this is still a
pretty internal operation, so I don't see it as critical.

 not aware of any modern plataform that int8 is not 64 bits. I'm not against
 numeric use; I'm just saying that int8 is sufficient.

 The point isn't that int8 might not be 64 bits - of course it has to
 be 64 bits; that's why it's called int8 i.e. 8 bytes.  The point is
 that a large enough LSN, represented as an int8, will come out as a
 negative values.  int8 can only represent 2^63 *non-negative* values,
 because one bit is reserved for sign.

Doing it in numeric should be perfectly fine. The only real reason to
pick int8 over in this context would be performance, but it's not like
this is something that's going to be called in really performance
critical paths...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2011-12-20 Thread Tomas Vondra
On 20 Prosinec 2011, 11:20, Magnus Hagander wrote:
 2011/12/20 Tomas Vondra t...@fuzzy.cz:

 I haven't updated the docs yet - let's see if the patch is acceptable at
 all first.

 Again, without having reviewed the code, this looks like a feature
 we'd want, so please add some docs, and then submit it for the next
 commitfest!

Hm, I added it to the current commit fest - I should probably move it to
the next one (2012-01), right? I'll add the docs at the evening.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: tracking temp files in pg_stat_database

2011-12-20 Thread Magnus Hagander
On Tue, Dec 20, 2011 at 11:45, Tomas Vondra t...@fuzzy.cz wrote:
 On 20 Prosinec 2011, 11:20, Magnus Hagander wrote:
 2011/12/20 Tomas Vondra t...@fuzzy.cz:

 I haven't updated the docs yet - let's see if the patch is acceptable at
 all first.

 Again, without having reviewed the code, this looks like a feature
 we'd want, so please add some docs, and then submit it for the next
 commitfest!

 Hm, I added it to the current commit fest - I should probably move it to
 the next one (2012-01), right? I'll add the docs at the evening.

Yes, all new patches should always go on the one that's labeled
Open. If we don't do it that way, we will never finish a CF...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2011-12-20 Thread Alexander Korotkov
Hi!

On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley npbo...@gmail.com wrote:

 FYI, I've added myself as the reviewer for the current commitfest.

How is going review now?

--
With best regards,
Alexander Korotkov.


[HACKERS] Real-life range datasets

2011-12-20 Thread Alexander Korotkov
Hackers,

For better GiST indexing of range types it's important to have real-life
datasets for testing on. Real-life range datasets would help to proof (or
reject) some concepts and get more realistic benchmarks. Also, it would be
nice to know what queries you expect to run fast on that datasets. Ideally
it should be real-life set of queries, but it also could be your
presentation of what are typical queries  for such datasets.
Thanks!

-
With best regards,
Alexander Korotkov.


Re: [HACKERS] xlog location arithmetic

2011-12-20 Thread Euler Taveira de Oliveira
On 20-12-2011 07:27, Magnus Hagander wrote:
 On Tue, Dec 6, 2011 at 19:06, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Dec 6, 2011 at 1:00 PM, Euler Taveira de Oliveira
 eu...@timbira.com wrote:
 On 06-12-2011 13:11, Robert Haas wrote:
 On Tue, Dec 6, 2011 at 5:14 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 I've been considering similar things, as you can find in the archives,
 but what I was thinking of was converting the number to just a plain
 bigint, then letting the user apply whatever arithmetic wanted at the
 SQL level. I never got around to acutally coding it, though. It could
 easily be extracted from your patch of course - and I think that's a
 more flexible approach. Is there some advantage to your method that
 I'm missing?

 I went so far as to put together an lsn data type.  I didn't actually
 get all that far with it, which is why I haven't posted it sooner, but
 here's what I came up with.  It's missing indexing support and stuff,
 but that could be added if people like the approach.  It solves this
 problem by implementing -(lsn,lsn) = numeric (not int8, that can
 overflow since it is not unsigned), which allows an lsn = numeric
 conversion by just subtracting '0/0'::lsn.

 Interesting approach. I don't want to go that far. If so, you want to change
 all of those functions that deal with LSNs and add some implicit conversion
 between text and lsn data types (for backward compatibility). As of int8, 
 I'm
 
 As long as you have the conversion, you don't really need to change
 them, do you? It might be nice in some ways, but this is still a
 pretty internal operation, so I don't see it as critical.
 
For correctness, yes.

At this point, my question is: do we want to support the lsn data type idea or
a basic function that implements the difference between LSNs?


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to allow users to kill their own queries

2011-12-20 Thread Magnus Hagander
On Mon, Dec 19, 2011 at 15:50, Greg Smith g...@2ndquadrant.com wrote:
 On 12/18/2011 07:31 AM, Magnus Hagander wrote:

 * I restructured the if statements, because I had a hard time
 following the comments around that ;) I find this one easier - but I'm
 happy to change back if you think your version was more readable.


 That looks fine.  I highlighted this because I had a feeling there was still
 some gain to be had here, just didn't see it myself.  This works.


 * The error message in pg_signal_backend breaks the abstraction,
 because it specifically talks about terminating the other backend -
 when it's not supposed to know about that in that function. I think we
 either need to get rid of the hint completely, or we need to find a
 way to issue it from the caller. Or pass it as a parameter. It's fine
 for now since we only have two signals, but we might have more in the
 future..


 I feel that including a hint in the pg_terminate_backend case is a UI
 requirement.  If someone has made it as far as discovering that function
 exists, tries calling it, and it fails, the friendly thing to do is point
 them toward a direction that might work better.  Little things like that
 make a huge difference in how friendly the software appears to its users;
 this is even more true in cases where version improvements actually expand
 what can and cannot be done.

 My quick and dirty side thinks that just documenting the potential future
 issues would be enough:

 Due to the limited number of callers of this function, the hint message
 here can be certain that pg_terminate_backend provides the only path to
 reach this point.  If more callers to pg_signal_backend appear, a more
 generic hint mechanism might be needed here.

 If you must have this more generic mechanism available, I would accept
 re-factoring to provide it instead.  What I wouldn't want to live with is a
 commit of this where the hint goes away completely.  It's taken a long time
 chopping the specification to get this feature sorted out; we might as well
 make what's left be the best it can be now.

How about something like this - passing it in as a parameter?

That said - can someone who knows the translation stuff better than me
comment on if this is actually going to be translatable, or if it
violates too many translation rules?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e7f7fe0..cf77586 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14244,8 +14244,8 @@ SELECT set_config('log_statement_stats', 'off', false);
para
 The functions shown in xref
 linkend=functions-admin-signal-table send control signals to
-other server processes.  Use of these functions is restricted
-to superusers.
+other server processes.  Use of these functions is usually restricted
+to superusers, with noted exceptions.
/para
 
table id=functions-admin-signal-table
@@ -14262,7 +14262,10 @@ SELECT set_config('log_statement_stats', 'off', false);
 literalfunctionpg_cancel_backend(parameterpid/parameter typeint/)/function/literal
 /entry
entrytypeboolean/type/entry
-   entryCancel a backend's current query/entry
+   entryCancel a backend's current query.  You can execute this against
+another backend that has exactly the same role as the user calling the
+function.  In all other cases, you must be a superuser.
+/entry
   /row
   row
entry
@@ -14304,6 +14307,10 @@ SELECT set_config('log_statement_stats', 'off', false);
 commandpostgres/command processes on the server (using
 applicationps/ on Unix or the applicationTask
 Manager/ on productnameWindows/).
+For the less restrictive functionpg_cancel_backend/, the role of an
+active backend can be found from
+the structfieldusename/structfield column of the
+structnamepg_stat_activity/structname view.
/para
 
para
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 7a2e0c8..45520b6 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -30,6 +30,7 @@
 #include postmaster/syslogger.h
 #include storage/fd.h
 #include storage/pmsignal.h
+#include storage/proc.h
 #include storage/procarray.h
 #include tcop/tcopprot.h
 #include utils/builtins.h
@@ -70,15 +71,45 @@ current_query(PG_FUNCTION_ARGS)
 }
 
 /*
- * Functions to send signals to other backends.
+ * Send a signal to another backend.
+ * If allow_same_role is false, actionstr must be set to a string
+ * indicating what the signal does, to be inserted in the error message, and
+ * hint should be set to a hint to be sent along with this message.
  */
 static bool
-pg_signal_backend(int pid, int sig)
+pg_signal_backend(int pid, int sig, bool allow_same_role, const char *actionstr, const char *hint)
 {
+	PGPROC	   *proc;
+
 	if (!superuser())
-		ereport(ERROR,
-			

Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-20 Thread Robert Haas
On Tue, Dec 20, 2011 at 1:14 AM, Nikhil Sontakke nikkh...@gmail.com wrote:
 Agreed. I just tried out the scenarios laid out by you both with and without
 the committed patch and AFAICS, normal inheritance semantics have been
 preserved properly even after the commit.

No, they haven't.  I didn't expect this to break anything when you
have two constraints with different names.  The problem is when you
have two constraints with the same name.

Testing reveals that this is, in fact, broken:

rhaas=# create table A(ff1 int);
CREATE TABLE
rhaas=# create table B () inherits (A);
CREATE TABLE
rhaas=# create table C () inherits (B);
CREATE TABLE
rhaas=# alter table only b add constraint chk check (ff1  0);
ALTER TABLE
rhaas=# alter table a add constraint chk check (ff1  0);
NOTICE:  merging constraint chk with inherited definition
ALTER TABLE

At this point, you'll find that a has a constraint, and b has a
constraint, but *c does not have a constraint*.  That's bad, because
a's constraint wasn't only and should therefore have propagated all
the way down the tree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] sync_seqscans in postgresql.conf

2011-12-20 Thread Magnus Hagander
Is there any reason why the setting synchronize_seqscans is in the
section version/platform compatibility in postgresql.conf? Is it
just because nobody could find a better place for it? ;) It seems a
bit wrong to me...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-20 Thread Andrew Dunstan



On 12/20/2011 05:13 AM, pratikchirania wrote:

Would this be alleviated by setting stats_temp_dir to point to a ramdisk?

I am not aware how to do this. I am using a windows server OS.
The conf file has the entry : #stats_temp_directory = 'pg_stat_tmp'

What do I change it to? Please elucidate.



On Windows it appears you need third party software for a ramdisk. 
Search google for info.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-20 Thread Robert Haas
On Mon, Dec 19, 2011 at 2:44 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 I was thinking that we would warn when such was found, set hint bits
 as needed, and rewrite with the new CRC.  In the unlikely event that
 it was a torn hint-bit-only page update, it would be a warning about
 something which is a benign side-effect of the OS or hardware crash.

But that's terrible.  Surely you don't want to tell people:

WARNING:  Your database is corrupted, or maybe not.  But don't worry,
I modified the data block so that you won't get this warning again.

OK, I guess I'm not sure that you don't want to tell people that.  But
*I* don't!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sync_seqscans in postgresql.conf

2011-12-20 Thread Robert Haas
On Tue, Dec 20, 2011 at 8:35 AM, Magnus Hagander mag...@hagander.net wrote:
 Is there any reason why the setting synchronize_seqscans is in the
 section version/platform compatibility in postgresql.conf? Is it
 just because nobody could find a better place for it? ;) It seems a
 bit wrong to me...

Presumably the thought was that you would turn it off to restore that
existed in older versions of PostgreSQL.  Doesn't seem much different
from default_with_oids or lo_compat_privileges.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Replication timeout units

2011-12-20 Thread Magnus Hagander
from postgresql.conf.sample:

#replication_timeout = 60s  # in milliseconds; 0 disables

Seconds or milliseconds? I would suggest we just remove the in
milliseconds, and instead say timeout for replication connections; 0
disables.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pause at end of recovery

2011-12-20 Thread Magnus Hagander
These days we have pause_at_recovery_target, which lets us pause when
we reach a PITR target. Is there a particular reason we don't have a
way to pause at end of recovery if we *didn't* specify a target -
meaning we let it run until the end of the archived log? While it's
too late to change the target, I can see a lot of usescases where you
don't want it to be possible to make changes to the database again
until it has been properly verified - and keeping it up in readonly
mode in that case can be quite useful...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sync_seqscans in postgresql.conf

2011-12-20 Thread Magnus Hagander
On Tue, Dec 20, 2011 at 14:38, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Dec 20, 2011 at 8:35 AM, Magnus Hagander mag...@hagander.net wrote:
 Is there any reason why the setting synchronize_seqscans is in the
 section version/platform compatibility in postgresql.conf? Is it
 just because nobody could find a better place for it? ;) It seems a
 bit wrong to me...

 Presumably the thought was that you would turn it off to restore that
 existed in older versions of PostgreSQL.  Doesn't seem much different
 from default_with_oids or lo_compat_privileges.

Seems very different to me - those change *what* happens when you do
certain things. sync_seqscans is just a performance tuning option, no?
It doesn't actually change the semantics of any operations...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sync_seqscans in postgresql.conf

2011-12-20 Thread k...@rice.edu
On Tue, Dec 20, 2011 at 02:41:54PM +0100, Magnus Hagander wrote:
 On Tue, Dec 20, 2011 at 14:38, Robert Haas robertmh...@gmail.com wrote:
  On Tue, Dec 20, 2011 at 8:35 AM, Magnus Hagander mag...@hagander.net 
  wrote:
  Is there any reason why the setting synchronize_seqscans is in the
  section version/platform compatibility in postgresql.conf? Is it
  just because nobody could find a better place for it? ;) It seems a
  bit wrong to me...
 
  Presumably the thought was that you would turn it off to restore that
  existed in older versions of PostgreSQL.  Doesn't seem much different
  from default_with_oids or lo_compat_privileges.
 
 Seems very different to me - those change *what* happens when you do
 certain things. sync_seqscans is just a performance tuning option, no?
 It doesn't actually change the semantics of any operations...
 

In a query without enforced orders, the returned rows will come out in
a possibly different order each time the query runs. I know it is bad
coding to depend on things like that, but it is out there... So in those
cases it is not just semantics.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sync_seqscans in postgresql.conf

2011-12-20 Thread Magnus Hagander
On Tue, Dec 20, 2011 at 14:47, k...@rice.edu k...@rice.edu wrote:
 On Tue, Dec 20, 2011 at 02:41:54PM +0100, Magnus Hagander wrote:
 On Tue, Dec 20, 2011 at 14:38, Robert Haas robertmh...@gmail.com wrote:
  On Tue, Dec 20, 2011 at 8:35 AM, Magnus Hagander mag...@hagander.net 
  wrote:
  Is there any reason why the setting synchronize_seqscans is in the
  section version/platform compatibility in postgresql.conf? Is it
  just because nobody could find a better place for it? ;) It seems a
  bit wrong to me...
 
  Presumably the thought was that you would turn it off to restore that
  existed in older versions of PostgreSQL.  Doesn't seem much different
  from default_with_oids or lo_compat_privileges.

 Seems very different to me - those change *what* happens when you do
 certain things. sync_seqscans is just a performance tuning option, no?
 It doesn't actually change the semantics of any operations...


 In a query without enforced orders, the returned rows will come out in
 a possibly different order each time the query runs. I know it is bad
 coding to depend on things like that, but it is out there... So in those
 cases it is not just semantics.

Yes, but they may also come out in a different order if you run the
same query again 5 minutes later...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sync_seqscans in postgresql.conf

2011-12-20 Thread k...@rice.edu
On Tue, Dec 20, 2011 at 02:54:32PM +0100, Magnus Hagander wrote:
 On Tue, Dec 20, 2011 at 14:47, k...@rice.edu k...@rice.edu wrote:
  On Tue, Dec 20, 2011 at 02:41:54PM +0100, Magnus Hagander wrote:
  On Tue, Dec 20, 2011 at 14:38, Robert Haas robertmh...@gmail.com wrote:
   On Tue, Dec 20, 2011 at 8:35 AM, Magnus Hagander mag...@hagander.net 
   wrote:
   Is there any reason why the setting synchronize_seqscans is in the
   section version/platform compatibility in postgresql.conf? Is it
   just because nobody could find a better place for it? ;) It seems a
   bit wrong to me...
  
   Presumably the thought was that you would turn it off to restore that
   existed in older versions of PostgreSQL.  Doesn't seem much different
   from default_with_oids or lo_compat_privileges.
 
  Seems very different to me - those change *what* happens when you do
  certain things. sync_seqscans is just a performance tuning option, no?
  It doesn't actually change the semantics of any operations...
 
 
  In a query without enforced orders, the returned rows will come out in
  a possibly different order each time the query runs. I know it is bad
  coding to depend on things like that, but it is out there... So in those
  cases it is not just semantics.
 
 Yes, but they may also come out in a different order if you run the
 same query again 5 minutes later...
 

If the sequential scans always start at the beginning of the table, which
was true before the sync-ed scans ability, the order is basically fixed
for a large set of queries if you do not modify the data. With sync-ed
scans, every repetition of the query will depend on where the scan starts
in the data set. At least that is what I remember happening during the
original testing of that feature, which is a great feature.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-20 Thread Nikhil Sontakke
  Agreed. I just tried out the scenarios laid out by you both with and
 without
  the committed patch and AFAICS, normal inheritance semantics have been
  preserved properly even after the commit.

 No, they haven't.  I didn't expect this to break anything when you
 have two constraints with different names.  The problem is when you
 have two constraints with the same name.

 Testing reveals that this is, in fact, broken:

 rhaas=# create table A(ff1 int);
 CREATE TABLE
 rhaas=# create table B () inherits (A);
 CREATE TABLE
 rhaas=# create table C () inherits (B);
 CREATE TABLE
 rhaas=# alter table only b add constraint chk check (ff1  0);
 ALTER TABLE
 rhaas=# alter table a add constraint chk check (ff1  0);
 NOTICE:  merging constraint chk with inherited definition
 ALTER TABLE

 At this point, you'll find that a has a constraint, and b has a
 constraint, but *c does not have a constraint*.  That's bad, because
 a's constraint wasn't only and should therefore have propagated all
 the way down the tree.


Apologies, I did not check this particular scenario.

I guess, here, we should not allow merging of the inherited constraint into
an only constraint. Because that breaks the semantics for only
constraints. If this sounds ok, I can whip up a patch for the same.

Regards,
Nikhils


[HACKERS] Extensions and 9.2

2011-12-20 Thread Dimitri Fontaine
Hi,

I've sent a first patch to improve extensions for 9.2, and intend on
sending a few more which I'll briefly present here. The point of this
email is to figure out how to branch the development, as all the patch
are going to conflict somehow (change the same parts of the code).

Either I develop them separately, with separate branches derived from
the master one, or I develop them as a stack, one on top of the other.
The difference is my ability to provide a patch for one of the features
that can be applied to master directly compared to how much time I have
to spend cooking one patch or the other (merge conflicts, etc).

If we are going to try and commit all of those for 9.2, then I can stack
them all atop of each other and have an easier development time.  Here's
the list:

 - extension features (requires / provides)

   as already sent, allows fine grained dependency management

 - SQL only extensions

   the goal here is to be able to install an SQL only extension without
   having to be granted OS shell access on the PostgreSQL server, or
   other arrangement allowing you to ship files (.control, .sql) in a
   place where usually only “root” has write access.

   meaning that the control file property that says “superuser = false”
   can be true for the distribution of extension too.

 - extension modules

   the goal here is to be able to list all the modules that are loaded
   by an extension — the install script will install all functions and
   will be loading all related .so, it's easy enough to keep track of
   them at creating_extension time and “register” that module list.

   that helps with systems auditing when you're called to understand a
   crash after the fact. Of course even better would be to only allow
   loading modules that as part of extensions, and to be able to list
   implemented hooks (and which extension is implementing which hook),
   but that would/could be some follow-up patches.

 - extension whitelisting

   the goal here is to grant non superuser to install extensions from a
   restricted list, introducing a specific “sudo” like behavior when the
   extension is implemented in C or some other non trusted language.

   that could be easily done with the current command trigger patch and
   a trigger procedure that is security definer, and doesn't need the
   parsetree at all, but that could easily drift away from 9.2, so maybe
   a specific implementation would be better here

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-20 Thread Nikhil Sontakke
 rhaas=# create table A(ff1 int);
 CREATE TABLE
 rhaas=# create table B () inherits (A);
 CREATE TABLE
 rhaas=# create table C () inherits (B);
 CREATE TABLE
 rhaas=# alter table only b add constraint chk check (ff1  0);
 ALTER TABLE
 rhaas=# alter table a add constraint chk check (ff1  0);
 NOTICE:  merging constraint chk with inherited definition
 ALTER TABLE

 At this point, you'll find that a has a constraint, and b has a
 constraint, but *c does not have a constraint*.  That's bad, because
 a's constraint wasn't only and should therefore have propagated all
 the way down the tree.


 Apologies, I did not check this particular scenario.

 I guess, here, we should not allow merging of the inherited constraint
 into an only constraint. Because that breaks the semantics for only
 constraints. If this sounds ok, I can whip up a patch for the same.


PFA, patch which does just this.

postgres=# alter table a add constraint chk check (ff1  0);
ERROR:  constraint chk for relation b is an ONLY constraint. Cannot
merge

Regards,
Nikhils


only_constraint_no_merge.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sync_seqscans in postgresql.conf

2011-12-20 Thread Tom Lane
k...@rice.edu k...@rice.edu writes:
 On Tue, Dec 20, 2011 at 02:54:32PM +0100, Magnus Hagander wrote:
 Seems very different to me - those change *what* happens when you do
 certain things. sync_seqscans is just a performance tuning option, no?
 It doesn't actually change the semantics of any operations...

 In a query without enforced orders, the returned rows will come out in
 a possibly different order each time the query runs. I know it is bad
 coding to depend on things like that, but it is out there... So in those
 cases it is not just semantics.

Right.  It *is* query semantics for people who are depending on getting
the same row order each time they read an unchanging table.  Yeah, the
SQL standard implies that that's not guaranteed, but in all PG versions
before we added syncscan, it did work that way, and some people need it
to continue to work that way (it's worth reflecting that syncscan would
break most or all of the regression tests if it had a smaller
granularity).  So it is a backwards-compatibility option.

Which is not to say that I like the current GUC classification in
general, but this particular one isn't out of place.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication timeout units

2011-12-20 Thread Peter Geoghegan
On 20 December 2011 13:38, Magnus Hagander mag...@hagander.net wrote:
 from postgresql.conf.sample:

 #replication_timeout = 60s      # in milliseconds; 0 disables

 Seconds or milliseconds? I would suggest we just remove the in
 milliseconds, and instead say timeout for replication connections; 0
 disables.

+1 from me. That's very confusing.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes

2011-12-20 Thread Marti Raudsepp
On Tue, Dec 20, 2011 at 07:08, Tom Lane t...@sss.pgh.pa.us wrote:
 it'd likely be better if this code ignored unrecognized qual expression
 types rather than Assert'ing they're not there.

The patch replaced that Assert with an elog(ERROR)

 Hmm.  I am reminded of how utterly unreadable diff -u format is for
 anything longer than single-line changes :-( ...

Sorry, the new patch is in context (-C) diff format proper. I also
moved around code a bit and removed an unused variable that was left
around from the refactoring.

 but I think I don't
 like this refactoring much.  Will take a closer look tomorrow.

I was afraid you'd say that, especially for a change that should be
backpatched. But I couldn't think of alternative ways to do it that
give non-bogus estimates.



While writing this patch, the largest dilemma was where to account for
the multiple array scans. Given that this code is mostly a heuristic
and I lack a deep understanding of GIN indexes, it's likely that I got
this part wrong.

Currently I'm doing this:
partialEntriesInQuals *= array_scans;
exactEntriesInQuals   *= array_scans;
searchEntriesInQuals  *= array_scans;

Which seems to be the right thing as far as random disk accesses are
concerned (successive scans are more likely to hit the cache) and also
works well with queries that don't touch most of the index. But this
fails spectacularly when multiple full scans are performed e.g. LIKE
ANY ('{%,%,%}'). Because index_pages_fetched() ends up removing all of
the rescan costs.

Another approach is multiplying the total cost from the number of
scans. This overestimates random accesses from rescans, but fixes the
above case:
*indexTotalCost = (*indexStartupCost + dataPagesFetched *
spc_random_page_cost) * array_scans;

Regards,
Marti
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
new file mode 100644
index e7af7d4..250d853
*** a/contrib/pg_trgm/expected/pg_trgm.out
--- b/contrib/pg_trgm/expected/pg_trgm.out
*** explain (costs off)
*** 3486,3491 
--- 3486,3501 
   Index Cond: (t ~~* '%BCD%'::text)
  (4 rows)
  
+ explain (costs off)
+   select * from test2 where t like any ('{%bcd%,qua%}');
+QUERY PLAN
+ -
+  Bitmap Heap Scan on test2
+Recheck Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[]))
+-  Bitmap Index Scan on test2_idx_gin
+  Index Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[]))
+ (4 rows)
+ 
  select * from test2 where t like '%BCD%';
   t 
  ---
*** select * from test2 where t ilike 'qua%'
*** 3509,3514 
--- 3519,3531 
   quark
  (1 row)
  
+ select * from test2 where t like any ('{%bcd%,qua%}');
+t
+ 
+  abcdef
+  quark
+ (2 rows)
+ 
  drop index test2_idx_gin;
  create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
  set enable_seqscan=off;
*** explain (costs off)
*** 3528,3533 
--- 3545,3560 
 Index Cond: (t ~~* '%BCD%'::text)
  (2 rows)
  
+ explain (costs off)
+   select * from test2 where t like any ('{%bcd%,qua%}');
+QUERY PLAN
+ -
+  Bitmap Heap Scan on test2
+Recheck Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[]))
+-  Bitmap Index Scan on test2_idx_gist
+  Index Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[]))
+ (4 rows)
+ 
  select * from test2 where t like '%BCD%';
   t 
  ---
*** select * from test2 where t ilike 'qua%'
*** 3551,3553 
--- 3578,3587 
   quark
  (1 row)
  
+ select * from test2 where t like any ('{%bcd%,qua%}');
+t
+ 
+  abcdef
+  quark
+ (2 rows)
+ 
diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql
new file mode 100644
index ea902f6..ac969e6
*** a/contrib/pg_trgm/sql/pg_trgm.sql
--- b/contrib/pg_trgm/sql/pg_trgm.sql
*** explain (costs off)
*** 47,56 
--- 47,59 
select * from test2 where t like '%BCD%';
  explain (costs off)
select * from test2 where t ilike '%BCD%';
+ explain (costs off)
+   select * from test2 where t like any ('{%bcd%,qua%}');
  select * from test2 where t like '%BCD%';
  select * from test2 where t like '%bcd%';
  select * from test2 where t ilike '%BCD%';
  select * from test2 where t ilike 'qua%';
+ select * from test2 where t like any ('{%bcd%,qua%}');
  drop index test2_idx_gin;
  create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
  set enable_seqscan=off;
*** explain (costs off)
*** 58,64 
--- 61,70 
select * from test2 where t like '%BCD%';
  explain (costs off)
select * from test2 where t ilike '%BCD%';
+ explain (costs off)
+   select * from test2 where t like any ('{%bcd%,qua%}');
  select * from test2 where t like '%BCD%';
  select * from test2 where t like '%bcd%';
  select * from test2 where t ilike '%BCD%';