Re: [HACKERS] delta relations in AFTER triggers

2014-06-18 Thread Kevin Grittner
Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Kevin Grittner wrote:

 TRUNCATE triggers shouldn't have delta relations, that's for
 sure, or it'd completely negate the point of TRUNCATE triggers.
 I don't think we have any other event, do we?  People who get
 delta relations for deleting all rows should be using DELETE, I
 think.

That sounds reasonable.  The only other issue was INSTEAD OF
triggers, but I don't think we need them there, either.

 I am not sure about providing delta relations in the FOR EACH ROW
 case.  For what cases are them useful?

In an accounting application for courts I have seen a case where
each receivable (asset) account had a contra (liability) account of
uncollected receivables, because until and unless the Clerk of
Courts collected the judgment there was no obligation to pay the
money back out.  Any financial transaction had to be in a database
transaction, and not only did all the transaction detail need to
balance to zero, but any receivable detail row needed to be
immediately followed by a balancing contra account row (and vice
versa).  A FOR EACH ROW trigger, on seeing one of these rows, could
check for the required balancing row.  Now this would only be
solved by the standard feature if both rows were always inserted by
a single statement, which might not always have been the case; so
even with this example I am stretching a bit.  But it's close
enough to suggest that there might be legitimate uses.

And there is the fact that the standard seems to want this to be
supported.

 In all cases I think NEW and OLD are sufficient.  I didn't read
 the standard, but if it's saying that in FOR EACH ROW the
 new/deleted/changed row should be accessible by way of a delta
 relation, [...]

No, it says that you can specify *both* the row variables for the
row under consideration and the tables for the full set of rows
affected by the statement *for the same FOR EACH ROW trigger*.

 Now, if you only have delta relations in FOR EACH STATEMENT, then
 it seems to me you can optimize obtaining them only when such
 triggers are defined; this lets you do away with the reloption
 entirely, doesn't it?

That was intended to minimize the situations under which there was
a performance hit where the new delta relations were not needed in
an AFTER trigger.  If we only generate these for FOR EACH STATEMENT
triggers, that certainly reduces the need for that, but I'm not
sure it eliminates it -- especially if we're generating tuplestores
for the full rows rather than their TIDs.

It is already generating the tuplestores only if there is an AFTER
trigger for the type of statement being executed, but I agree that
it would be a very rare FOR EACH ROW trigger that actually used it.
Of course, that is one argument for the standard syntax -- we
could test whether any of the trigger definitions for that
operation on that table specified each transition table, and only
generate them if needed based on that.

 I noticed that GetCurrentFDWTuplestore() changed its identity
 without getting its comment updated.  The new name seems awfully
 generic, and I don't think it really describes what the function
 does.  I think it needs more renaminguu

Good point.  Will fix that in the next version.

 (1)  My first impulse was to capture this delta data in the form
 of tuplestores of just TIDs, and fetching the tuples themselves
 from the heap on reference.  In earlier discussions others have
 argued for using tuplestores of the actual rows themselves.

 Can you please supply pointers to such discussion?

That was in a matview-related discussion, so perhaps we should
ignore that for now and discuss what's best for triggers here.  If
matviews need something different, the rows could always be
materialized from the TIDs at a later point.

 I don't see any point in not just storing TIDs, but perhaps I'm
 missing something.

I think there was some question about performance, but I really
have a hard time seeing the performance being significantly worse
with a TID tuplestore for most reasonable uses; and I think the TID
tuplestore could be a lot faster if (for example) you have a table
with a large, TOASTed text or bytea column which is not referenced
(in selection criteria or the SET clause) and is not used in the
FOR EACH STATEMENT trigger.

I think there might also have been some question about visibility.
A TID tuplestore would need to use a different snapshot (like maybe
SnapshotAny) in the same query where it joined to other tables with
a normal MVCC snapshot.

 (2)  Do we want to just pick names for these in the PLs rather
 than using the standard syntax?  Implementing the standard
 syntax seemed to require three new (unreserved) keywords,
 changes to the catalogs to store the chosen relations names, and
 some way to tie the specified relation names in to the various
 PLs.

 I think the only one for which we have a compulsion to follow
 someone in this area would be PL/pgSQL,

... which currently hard-codes the row 

Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Bruce Momjian
On Wed, Jun 18, 2014 at 04:41:30PM -0400, Robert Haas wrote:
 On Wed, Jun 18, 2014 at 3:53 PM, Josh Berkus j...@agliodbs.com wrote:
  On 06/18/2014 12:32 PM, Tom Lane wrote:
  Josh Berkus j...@agliodbs.com writes:
   There are plenty of badly-written applications which auto-begin, that
  is, they issue a BEGIN; immediately after every COMMIT; whether or
  not there's any additional work to do.  This is a major source of IIT
  and the timeout should not ignore it.
 
  Nonsense.  We explicitly don't do anything useful until the first actual
  command arrives, precisely to avoid that problem.
 
  Oh, we don't allocate a snapshot?  If not, then no objection here.
 
 The only problem I see is that it makes the semantics kind of weird
 and confusing.  Kill connections that are idle in transaction for too
 long is a pretty clear spec; kill connections that are idle in
 transaction except if they haven't executed any commands yet because
 we think you don't care about that case is not quite as clear, and
 not really what the GUC name says, and maybe not what everybody wants,
 and maybe masterminding.

Kill connections that are idle in non-empty transaction block for too
long

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-18 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Kevin Grittner kgri...@ymail.com wrote:

 Can you describe what the standard syntax for the row variables
 is, as opposed to our syntax?  Also, what's the standard syntax
 for the the transition relations?

If you want either (or both), the standard has you using a
REFERENCING clause right before the FOR EACH part of the statement.
You can specify one or more sections in the format:

{ OLD | NEW } [ ROW | TABLE ] [ AS ] name

If you have more than one, they are separated by whitespace.  (No
commas or other visible delimiters.)  If you omit ROW/TABLE it
defaults to ROW.  You are only allowed to specify TABLE on an AFTER
trigger.  You are only allowed to specify ROW on a FOR EACH ROW
trigger.  (There is no prohibition against specifying TABLE on a
FOR EACH ROW trigger.)  You are only allowed to specify OLD for a
DELETE or UPDATE trigger.  (The ability for one trigger definition
to specify multiple operations is a PostgreSQL extension.)  You are
only allowed to specify NEW for an UPDATE or INSERT trigger.  You
may not repeat an entry.

Essentially, we have an implied clause on every FOR EACH ROW
trigger like:

  REFERENCING OLD ROW AS OLD NEW ROW AS NEW

 Some things which I *did* follow from the standard: these new
 relations are only allowed within AFTER triggers, but are
 available in both AFTER STATEMENT and AFTER ROW triggers.  That
 is, an AFTER UPDATE ... FOR EACH ROW trigger could use both the
 OLD and NEW row variables as well as the delta relations (under
 whatever names we pick).  That probably won't be used very
 often, but I can imagine some cases where it might be useful.  I
 expect that these will normally be used in FOR EACH STATEMENT
 triggers.

 I'm concerned about the performance implications of capturing the
 delta relations unconditionally.  If a particular trigger
 actually needs the delta relations, then the time spent capturing
 that information is well spent; but if it doesn't, then it's a
 waste.  There are lots of people using FOR EACH STATEMENT
 triggers right now who won't be happy if those start requiring
 O(n) additional temporary storage, where n is the number of rows
 modified by the statement.  This is likely an even bigger issue
 for per-row triggers, of course, where as you say, it probably
 won't be used often.

That is why I added a reloption which must be specifically enabled
for a table in order to generate these deltas.  That would be an
inconvenience for those wanting to use the new feature, but should
prevent a performance regression for any tables where it is not
specifically turned on.  That's not perfect, of course, because if
you turn it on for an UPDATE ... AFTER EACH STATEMENT trigger where
you want it, you do suffer the overhead on every AFTER trigger on
that table.  Perhaps this is sufficient reason to use the standard
syntax for the new delta tables -- the would then be generated only
in the specific cases where they were needed.  And I think I could
lose the reloption.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] btreecheck extension

2014-06-18 Thread Peter Geoghegan
On Wed, Jun 18, 2014 at 4:48 AM, Stephen Frost sfr...@snowman.net wrote:
 I'm fine with having these start out as external tools which are doing
 checks, but I've been specifically asked about (and have desired myself
 from time-to-time...) an in-core capability to check index/heap/etc
 validity.  Folks coming from certain other RDBMS's find it amazing that
 we don't have any support for that when what they really want is a
 background worker which is just automatically going around doing these
 checks.

Yes, I think that being able to verify the integrity of index and heap
relations is an important feature, and I think it's notable that we're
the only major RDBMS that lacks this support. I'm not quite sure what
that should entail just yet, but clearly it should be somewhat like
what I have here. I think the big open questions to make something
like this work are mostly around the cost/benefit ratio of each of the
checks I've outlined. Certainly, for that use case minimizing
disruption on a live system becomes even more important. I'll probably
look at a buffer access strategy, just to give an example of that off
the top of my head.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-18 Thread Kevin Grittner
David Fetter da...@fetter.org wrote:
 Robert Haas wrote:
 Kevin Grittner kgri...@ymail.com wrote:

 The good:
 - Generating the tuplestores.  Yay!

Thanks for that.  ;-)

 The bad:
 - Generating them exactly and only for AFTER triggers

The standard only allows them for AFTER triggers, and I'm not sure
what the semantics would be for any others.

 - Requiring that the tuplestores both be generated or not at
   all.  There are real use cases described below where only
   one would be relevant.

Yeah.

 - Generating the tuplestores unconditionally.

Well, there are conditions.  Only when the reloption allows and
only if there is an AFTER trigger for the type of operation in
progress.

 The ugly:
 - Attaching tuplestore generation to tables rather than
    callers (triggers, DML, etc.)

I'm not sure what you're getting at here.  This patch is
specifically only concerned with generating delta relations for DML
AFTER triggers, although my hope is that it will be a basis for
delta relations used for other purposes.  This seems to me like the
right place to initially capture the data for incremental
maintenance of materialized views, and might be of value for other
purposes, too.

 [formal definition of standard CREATE TRIGGER statement]

 Sorry that was a little verbose, but what it does do is give us
 what we need at trigger definition time.  I'd say it's pilot
 error if a trigger definition says make these tuplestores and
 the trigger body then does nothing with them, which goes to
 Robert's point below re: unconditional overhead.

Yeah, the more I think about it (and discuss it) the more I'm
inclined to suffer the additional complexity of the standard syntax
for specifying transition relations in order to avoid unnecessary
overhead creating them when not needed.  I'm also leaning toward
just storing TIDs in the tuplestores, even though it requires mixed
snapshots in executing queries in the triggers.  It just seems like
there will otherwise be to much overhead in copying around big,
unreferenced columns for some situations.

 Along that same line, we don't always need to capture both the
 before tuplestores and the after ones.  Two examples of this come
 to mind:

 - BEFORE STATEMENT triggers accessing rows, where there is no
 after part to use,

Are you talking about an UPDATE for which the AFTER trigger(s) only
reference the before transition table, and don't look at AFTER?  If
so, using the standard syntax would cover that just fine.  If not,
can you elaborate?

 and
 - DML (RETURNING BEFORE, e.g.) which only touches one of them. 
 This applies both to extant use cases of RETURNING and to planned
 ones.

I think that can be sorted out by a patch which implements that, if
these deltas even turn out to be the appropriate way to get that
data (which is not clear to me at this time).  Assuming standard
syntax, the first thing would be for the statement to somehow
communicate to the trigger layer the need to capture a tuplestore
it might otherwise not generate, and there would need to be a way
for the statement to access the needed tuplestore(s).  The
statement would also need to project the right set of columns. 
None of that seems to me to be relevant to this patch.  If this
patch turns out to provide infrastructure that helps, great.  If
you have a specific suggestion about how to make the tuplestores
more accessible to other layers, I'm listening.

 In summary, I'd like to propose that the tuplestores be generated
 separately in general and attached to callers. We can optimize
 this by not generating redundant tuplestores.

Well, if we use the standard syntax for CREATE TRIGGER and store
the transition table names (if any) in pg_trigger, the code can
generate one relation if any AFTER triggers which are going to fire
need it.  I don't see any point in generating exactly the same
tuplestore contents for each trigger.  And suspect that we can wire
in any other uses later when we have something to connect them to.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Jsonb: jbvBinary usage in the convertJsonbValue?

2014-06-18 Thread Andrew Dunstan


On 06/02/2014 11:38 AM, Andrew Dunstan wrote:


On 06/02/2014 10:22 AM, Andrew Dunstan wrote:


On 06/02/2014 10:02 AM, Robert Haas wrote:
On Thu, May 29, 2014 at 7:34 AM, Dmitry Dolgov 
9erthali...@gmail.com wrote:

I'm little confused by the convertJsonbValue functon at jsonb_utils.c
Maybe I misunderstood something, so I need help =)


if (IsAJsonbScalar(val) || val-type == jbvBinary)
 convertJsonbScalar(buffer, header, val);
As I can see, the convertJsonbScalar function is used for scalar 
and binary

jsonb values. But this function doesn't handle the jbvBinary type.

There definitely seems to be an oversight here of some kind. Either
convertJsonbValue() shouldn't be calling convertJsonbScalar() with an
object of type jbvBinary, or convertJsonbScalar() should know how to
handle that case.




Yes, I've just been looking at that. I think this is probably a 
hangover from when these routines were recast to some extent. Given 
that we're not seeing any errors from it, I'd be inclined to remove 
the the || val-type == jbvBinary part. One of the three call sites 
to convertJsonbValue asserts that this can't be true, and doing so 
doesn't result in a regression failure.


Peter and Teodor, comments?




Thinking about it a bit more, ISTM this should be ok, since we convert 
a JsonbValue to Jsonb in a depth-first recursive pattern. We should 
certainly add some comments along these lines to explain why the 
jbvbinary case shouldn't arise.




Nobody has commented further that I have noticed, so I have committed this.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Josh Berkus
On 06/18/2014 02:52 PM, Bruce Momjian wrote:
 On Wed, Jun 18, 2014 at 04:41:30PM -0400, Robert Haas wrote:
 The only problem I see is that it makes the semantics kind of weird
 and confusing.  Kill connections that are idle in transaction for too
 long is a pretty clear spec; kill connections that are idle in
 transaction except if they haven't executed any commands yet because
 we think you don't care about that case is not quite as clear, and
 not really what the GUC name says, and maybe not what everybody wants,
 and maybe masterminding.
 
 Kill connections that are idle in non-empty transaction block for too
 long

Here's the POLS violation in this:

I have idle_in_transaction_timeout set to 10min, but according to
pg_stat_activity I have six connections which are IIT for over an hour.
 What gives?

Robert's right, not killing the BEGIN; only transactions is liable to
result in user confusion unless we label those sessions differently in
pg_stat_activity. Tom is right in that killing them will cause some
users to not use IIT_timeout when they should, or will set the timeout
too high to be useful.

So it seems like what we should do is NOT call sessions IIT if they've
merely executed a BEGIN; and not done anything else.  Then the timeout
and pg_stat_activity would be consistent.

Counter-argument: most app frameworks which do an automatic BEGIN; also
do other stuff like SET TIMEZONE each time as well.  Is this really a
case worth worrying about?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Marko Tiikkaja

On 2014-06-19 1:46 AM, Josh Berkus wrote:

Robert's right, not killing the BEGIN; only transactions is liable to
result in user confusion unless we label those sessions differently in
pg_stat_activity.


Wouldn't they be labeled differently already?  i.e. the last query would 
be BEGIN.  Unless your app tries to unsuccessfully use nested 
transactions, you would know why it hasn't been killed.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Josh Berkus
On 06/18/2014 04:54 PM, Marko Tiikkaja wrote:
 On 2014-06-19 1:46 AM, Josh Berkus wrote:
 Robert's right, not killing the BEGIN; only transactions is liable to
 result in user confusion unless we label those sessions differently in
 pg_stat_activity.
 
 Wouldn't they be labeled differently already?  i.e. the last query would
 be BEGIN.  Unless your app tries to unsuccessfully use nested
 transactions, you would know why it hasn't been killed.

That's pretty darned obscure for a casual user.  *you* would know, and
*I* would know, but 99.5% of our users would be very confused.

Plus, if a session which has only issued a BEGIN; doesn't have a
snapshot and isn't holding any locks, then I'd argue we shouldn't lable
it IIT in the first place because it's not doing any of the bad stuff we
want to resolve by killing IITs.  Effectively, it's just idle.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-18 Thread Tom Lane
Gurjeet Singh gurj...@singh.im writes:
 On Tue, Jun 10, 2014 at 12:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we're going to do this, I would say that we should also take the
 opportunity to get out from under the question of which kernel API
 we're dealing with.  So perhaps a design like this:
 
 1. If the environment variable PG_OOM_ADJUST_FILE is defined, it's
 the name of a file to write something into after forking.  The typical
 value would be /proc/self/oom_score_adj or /proc/self/oom_adj.
 If not set, nothing happens.
 
 2. If the environment variable PG_OOM_ADJUST_VALUE is defined, that's
 the string we write, otherwise we write 0.

 Please find attached the patch. It includes the doc changes as well.

Applied with some editorialization.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Counter-argument: most app frameworks which do an automatic BEGIN; also
 do other stuff like SET TIMEZONE each time as well.  Is this really a
 case worth worrying about?

SET TIMEZONE doesn't result in taking a snapshot either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] idle_in_transaction_timeout

2014-06-18 Thread David G Johnston
On Wed, Jun 18, 2014 at 8:01 PM, Josh Berkus [via PostgreSQL] 
ml-node+s1045698n5807868...@n5.nabble.com wrote:

 On 06/18/2014 04:54 PM, Marko Tiikkaja wrote:
  On 2014-06-19 1:46 AM, Josh Berkus wrote:
  Robert's right, not killing the BEGIN; only transactions is liable to
  result in user confusion unless we label those sessions differently in
  pg_stat_activity.
 
  Wouldn't they be labeled differently already?  i.e. the last query would
  be BEGIN.  Unless your app tries to unsuccessfully use nested
  transactions, you would know why it hasn't been killed.

 That's pretty darned obscure for a casual user.  *you* would know, and
 *I* would know, but 99.5% of our users would be very confused.

 Plus, if a session which has only issued a BEGIN; doesn't have a
 snapshot and isn't holding any locks, then I'd argue we shouldn't lable
 it IIT in the first place because it's not doing any of the bad stuff we
 want to resolve by killing IITs.  Effectively, it's just idle.


​+1

Since the BEGIN is not meaningfully interpreted until the first subsequent
command (SET excepting apparently - are there others?) is issued no
transaction has begun at this point so idle and not IIT should be the
reported state on pg_stat_activity​.

Though I can understand some level of surprise if someone sees idle with
a BEGIN (or SET TIMEZONE) as the last executed command - so maybe idle
before transaction instead of idle in transaction - which hopefully will
not be assumed to be controlled by the idle_in_transaction_timeout GUC.
 I presume that we have some way to distinguish this particular case and
can report it accordingly.  Even if that state endures after a SET TIMEZONE
or similar session configuration command explaining that all those are
pre-transaction shouldn't be too hard - though as a third option idle
initialized transaction could be the state indicator.

Depending on how idle in transaction (aborted) gets resolved we could
also consider idle in transaction (initializing) - but since the former,
IMO, should be dropped (and thus matches the in specification of the GUC)
naming the later - which should not be dropped (I'll go with the stated
opinion here for now) - with in is not advisable.

The current behavior is transparent to the casual user so sticking with
idle does seem like the best choice to maintain the undocumented nature
of the behavior.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/idle-in-transaction-timeout-tp5805859p5807874.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: [HACKERS] How to change the pgsql source code and build it??

2014-06-18 Thread Craig Ringer
On 06/19/2014 01:50 AM, Shreesha wrote:

 However in my case, I don't know why there wasn't a default database
 with name 'root' got created or why the server rejects the client when
 it tries to connect to the default database. 

Take a look at the initdb manual, the tutorial, and the
installation/system administration documentation.

 Can anyone shed some light on 
 1) when the default database gets created

By initdb - see the initdb manual.

 2) how is this database 'name' is decided? Or what is the name of the
 default database name?

See the initdb manual.

 3) Is there any other places in the database server code where this
 check is applied?

Wherever you connect, libpq picks the default database = the current
username.

 Upon looking at the error I got, I believe the code is searching for the
 database name == user name.  If anyone can give some input on the code,
 it would be helpful!

libpq


I think you probably need to move this to pgsql-general, Stack Overflow,
etc. It's not really on topic for pgsql-hackers.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-18 Thread Matheus de Oliveira
I was going to answer each message, but Fabrízio summarized everything so
far really nicely. Thanks Fabrízio.

On Wed, Jun 18, 2014 at 5:25 PM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:

 Then, to summarize Matheus must do:
 * use an option instead of change the syntax and catalog to indicate that
 a tablespace is used to store temp objects


Yes. I myself wasn't sure TEMPORARY syntax would be good, but I would just
like to hear more about. Storing as an options seems nice to me.


 * throw an exception if we try ALTER the option only_temp_relations


I think we should postpone the ALTER option, perhaps as another patch.
Wasn't that what was decided?

I'll investigate the options better before going this route. Let me work on
CREATE first.



 * add regression tests
 * add documentation

 And, of course, register to the next open commitfest [1] to get detailed
 feedback and review.


Noted. It will be on the next commitfest. Just knowing some people do want
this make me more comfortable on working better.

Best regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


[HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Ian Barwick
Hi

I've just run into an index issue on 9.5 HEAD on a slave (master and slave
both compiled from 66802246e22d51858cd543877fcfddf24e6812f2); details
below (I have only found one index on the slave where the issue occurs so far).
The setup is admittedly slightly unusual; master is OS X 10.7.5, slave is
CentOS on a Virtualbox guest VM on the same system. The issue only occurs
with this combination of master and slave; I haven't been able to reproduce
it with master and slave running natively on OS X, or with a Linux guest VM
on a Linux machine. I have reproduced it several times on the OS X/Linux guest 
VM
combination.

I can't dig any further into this at the moment but can happily provide further
details etc.

Master
==

$ uname -a
Darwin nara.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 
PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64

tgg_current= SELECT version();
 version

--
 PostgreSQL 9.5devel on x86_64-apple-darwin11.4.2, compiled by gcc 
(MacPorts gcc48 4.8.2_2) 4.8.2, 64-bit
(1 row)

tgg_current= select user_id, login from tgg_user where login ='admin';
 user_id | login
-+---
   1 | admin
(1 row)


Slave
=

$ uname -a
Linux localhost.localdomain 2.6.32-358.el6.x86_64 #1 SMP Fri Feb 22 
00:31:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

tgg_current= select version();
 version

-
 PostgreSQL 9.5devel on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 
4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)

tgg_current= select user_id,login from tgg_user where login ='admin';
 user_id | login
-+---
(0 rows)

tgg_current= explain select user_id,login from tgg_user where login 
='admin';
 QUERY PLAN


 Index Scan using tgg_user_login_key on tgg_user  (cost=0.28..8.30 rows=1 
width=15)
   Index Cond: ((login)::text = 'admin'::text)
 Planning time: 0.105 ms
(3 rows)

tgg_current= set enable_bitmapscan=off;
SET
tgg_current= set enable_indexscan =off;
SET
tgg_current= select user_id,login from tgg_user where login ='admin';
 user_id | login
-+---
   1 | admin
(1 row)


tgg_current= \d tgg_user_login_key
   Index epp.tgg_user_login_key
 Column | Type  | Definition
+---+
 login  | character varying(32) | login
unique, btree, for table epp.tgg_user


Regards

Ian Barwick

-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Tom Lane
I wrote:
 I do see growth in the per-query context as well.  I'm not sure
 where that's coming from, but we probably should try to find out.
 A couple hundred bytes per iteration is going to add up, even if it's
 not as fast as 8K per iteration.  I'm not sure it's dblink's fault,
 because I don't see anything in dblink.c that is allocating anything in
 the per-query context, except for the returned tuplestores, which
 somebody else should clean up.

I poked at this example some more, and found that the additional memory
leak is occurring during evaluation of the arguments to be passed to
dblink().  There's been a comment there for a very long time suggesting
that we might need to do something about that ...

With the attached patch on top of yours, I see no leak anymore.

regards, tom lane

diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index f162e92..cf8cbb1 100644
*** a/src/backend/executor/execQual.c
--- b/src/backend/executor/execQual.c
*** ExecMakeTableFunctionResult(ExprState *f
*** 2014,2019 
--- 2014,2020 
  	PgStat_FunctionCallUsage fcusage;
  	ReturnSetInfo rsinfo;
  	HeapTupleData tmptup;
+ 	MemoryContext argContext = NULL;
  	MemoryContext callerContext;
  	MemoryContext oldcontext;
  	bool		direct_function_call;
*** ExecMakeTableFunctionResult(ExprState *f
*** 2083,2094 
  		/*
  		 * Evaluate the function's argument list.
  		 *
! 		 * Note: ideally, we'd do this in the per-tuple context, but then the
! 		 * argument values would disappear when we reset the context in the
! 		 * inner loop.  So do it in caller context.  Perhaps we should make a
! 		 * separate context just to hold the evaluated arguments?
  		 */
  		argDone = ExecEvalFuncArgs(fcinfo, fcache-args, econtext);
  		/* We don't allow sets in the arguments of the table function */
  		if (argDone != ExprSingleResult)
  			ereport(ERROR,
--- 2084,2104 
  		/*
  		 * Evaluate the function's argument list.
  		 *
! 		 * We can't do this in the per-tuple context: the argument values
! 		 * would disappear when we reset that context in the inner loop.  And
! 		 * the caller's context is typically a query-lifespan context, so we
! 		 * don't want to leak memory there.  So make a separate context just
! 		 * to hold the evaluated arguments.
  		 */
+ 		argContext = AllocSetContextCreate(callerContext,
+ 		   Table function arguments,
+ 		   ALLOCSET_DEFAULT_MINSIZE,
+ 		   ALLOCSET_DEFAULT_INITSIZE,
+ 		   ALLOCSET_DEFAULT_MAXSIZE);
+ 		oldcontext = MemoryContextSwitchTo(argContext);
  		argDone = ExecEvalFuncArgs(fcinfo, fcache-args, econtext);
+ 		MemoryContextSwitchTo(oldcontext);
+ 
  		/* We don't allow sets in the arguments of the table function */
  		if (argDone != ExprSingleResult)
  			ereport(ERROR,
*** no_function_result:
*** 2321,2328 
--- 2331,2342 
  			FreeTupleDesc(rsinfo.setDesc);
  	}
  
+ 	/* Clean up */
  	MemoryContextSwitchTo(callerContext);
  
+ 	if (argContext)
+ 		MemoryContextDelete(argContext);
+ 
  	/* All done, pass back the tuplestore */
  	return rsinfo.setResult;
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-18 Thread David Fetter
On Wed, Jun 18, 2014 at 03:30:34PM -0700, Kevin Grittner wrote:
 David Fetter da...@fetter.org wrote:
  Robert Haas wrote:
  Kevin Grittner kgri...@ymail.com wrote:
 
  The good:
  - Generating the tuplestores.  Yay!
 
 Thanks for that.  ;-)

Sorry, I just can't resist references to Spaghetti Westerns.
https://en.wikipedia.org/wiki/The_Good,_the_Bad_and_the_Ugly

  The bad:
  - Generating them exactly and only for AFTER triggers
 
 The standard only allows them for AFTER triggers, and I'm not sure
 what the semantics would be for any others.

As, so here's where we differ.  You're looking at deltas, a very nice
capability to have.  I'm looking at the before and after tuplestores
as components of which deltas, among many other things, could be
composed.

  - Requiring that the tuplestores both be generated or not at
    all.  There are real use cases described below where only
    one would be relevant.
 
 Yeah.
 
  - Generating the tuplestores unconditionally.
 
 Well, there are conditions.  Only when the reloption allows and
 only if there is an AFTER trigger for the type of operation in
 progress.

For deltas, this is just the thing.

I'm vaguely picturing the following as infrastructure:

- Instead of modifying Rel, we modify Query to contain two more bools
  default false: hasBeforeTuplestore and hasAfterTuplestore
- Each use case we implement would set 0 or more of these to true.
  For the delta use case, appropriate trigger definitions would set
  both.

This is vague because I haven't really gotten hacking on it, just
exploring what I hope are the relevant parts of the code.

  The ugly:
  - Attaching tuplestore generation to tables rather than
     callers (triggers, DML, etc.)
 
 I'm not sure what you're getting at here.  This patch is
 specifically only concerned with generating delta relations for DML
 AFTER triggers, although my hope is that it will be a basis for
 delta relations used for other purposes.  This seems to me like the
 right place to initially capture the data for incremental
 maintenance of materialized views, and might be of value for other
 purposes, too.

Hrm.  I don't really see this stuff as table properties.  The
materialized view case is an obvious example where the matview, not
the relations underneath, wants this information.  The relations
underneath may have their own concerns, but it's the matview whose
existence should ensure that the tuplestores are being generated.

Once the last depending-on-one-of-the-tuplestores things is gone, and
this could simply be the end of a RETURNING query, the tuplestores go
away.

  [formal definition of standard CREATE TRIGGER statement]
 
  Sorry that was a little verbose, but what it does do is give us
  what we need at trigger definition time.  I'd say it's pilot
  error if a trigger definition says make these tuplestores and
  the trigger body then does nothing with them, which goes to
  Robert's point below re: unconditional overhead.
 
 Yeah, the more I think about it (and discuss it) the more I'm
 inclined to suffer the additional complexity of the standard syntax
 for specifying transition relations in order to avoid unnecessary
 overhead creating them when not needed.  I'm also leaning toward
 just storing TIDs in the tuplestores, even though it requires mixed
 snapshots in executing queries in the triggers.

So in this case one tuplestore with two TIDs, either of which might be
NULL?

 just seems like there will otherwise be to much overhead in copying
 around big, unreferenced columns for some situations.

Yeah, it'd be nice to have the minimal part be as slim as possible.

  Along that same line, we don't always need to capture both the
  before tuplestores and the after ones.  Two examples of this come
  to mind:
 
  - BEFORE STATEMENT triggers accessing rows, where there is no
  after part to use,
 
 Are you talking about an UPDATE for which the AFTER trigger(s) only
 reference the before transition table, and don't look at AFTER?If
 so, using the standard syntax would cover that just fine.  If not,
 can you elaborate?

Sorry I was unclear.  I was looking at one of the many things having
these tuplestores around could enable.  As things stand now, there is
no access of any kind to rows with any per-statement trigger, modulo
user-space hacks like this one:

http://people.planetpostgresql.org/dfetter/index.php?/archives/71-Querying-Rows-in-Statement-Triggers.html

Having the before tuplestore available to a BEFORE STATEMENT trigger
would make it possible to do things with the before transition table
that are fragile and hacky now.

  and
  - DML (RETURNING BEFORE, e.g.) which only touches one of them. 
  This applies both to extant use cases of RETURNING and to planned
  ones.
 
 I think that can be sorted out by a patch which implements that, if
 these deltas even turn out to be the appropriate way to get that
 data (which is not clear to me at this time).

Again, I see the tuplestores as 

Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Peter Geoghegan
On Wed, Jun 18, 2014 at 6:54 PM, Ian Barwick i...@2ndquadrant.com wrote:
 I've just run into an index issue on 9.5 HEAD on a slave (master and slave
 both compiled from 66802246e22d51858cd543877fcfddf24e6812f2); details
 below (I have only found one index on the slave where the issue occurs so 
 far).

Would you mind running my btreecheck tool on both systems? That might
shed some light on this. You can get it from:
http://www.postgresql.org/message-id/cam3swzrtv+xmrwlwq6c-x7czvwavfdwfi4st1zz4ddgfh4y...@mail.gmail.com
.

I suggest running bt_parent_index_verify() and bt_leftright_verify()
on all indexes on both systems. It shouldn't take too long.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/18/2014 07:29 PM, Tom Lane wrote:
 I wrote:
 I do see growth in the per-query context as well.  I'm not sure 
 where that's coming from, but we probably should try to find
 out. A couple hundred bytes per iteration is going to add up,
 even if it's not as fast as 8K per iteration.  I'm not sure it's
 dblink's fault, because I don't see anything in dblink.c that is
 allocating anything in the per-query context, except for the
 returned tuplestores, which somebody else should clean up.
 
 I poked at this example some more, and found that the additional
 memory leak is occurring during evaluation of the arguments to be
 passed to dblink().  There's been a comment there for a very long
 time suggesting that we might need to do something about that ...
 
 With the attached patch on top of yours, I see no leak anymore.

I can confirm that -- rock solid with 1 million iterations. I assume
that should not be back-patched though?

On a side note, while perusing this section of code:

8-- at dblink.c:1176 --
 /* make sure we have a persistent copy of the tupdesc */
 tupdesc = CreateTupleDescCopy(tupdesc);

 /* check result and tuple descriptor have the same number of columns */
 if (nfields != tupdesc-natts)
 ereport(ERROR,
 (errcode(ERRCODE_DATATYPE_MISMATCH),
  errmsg(remote query result rowtype does not match 
  the specified FROM clause rowtype)));

 /* Prepare attinmeta for later data conversions */
 sinfo-attinmeta = TupleDescGetAttInMetadata(tupdesc);

 /* Create a new, empty tuplestore */
 oldcontext =
MemoryContextSwitchTo(rsinfo-econtext-ecxt_per_query_memory);
 sinfo-tuplestore = tuplestore_begin_heap(true, false, work_mem);
 rsinfo-setResult = sinfo-tuplestore;
 rsinfo-setDesc = tupdesc;
 MemoryContextSwitchTo(oldcontext);
8-- at dblink.c:1194 --

Shouldn't that CreateTupleDescCopy() happen in ecxt_per_query_memory?

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTolNHAAoJEDfy90M199hlEN8QAIi3eq5D93Y4CqWWS8Uulqx1
jOBQUoLTwF7/CkYM0F+tGvX1QO1BAIDAxS6jwWXK/c9NqesVuS5tNwQcdrLb69Pm
me4hQ2ZYtoCA4Y8SiFL0sOjUGuads2QEjhL3HUMQDBXTMjCpzamotIBGvYWu1OOe
bf1VaY83bwGa6XvXYcfFFyqyUz+kMHvcM/Rq9Mj7pLrGT9Lqvec9RL/1FhFYamrI
2ewaKYC4htWXwk1xIvpDZtWTjLyv3BUl3X41BzPOecChWqXmYCpaPo+dR6V9LSm1
OxGFQL4Z3pM7VZDWk5FOj3vOFz1AJhWXEh8fyzlLKeDm7FFaApyf3rPLULBRO4sj
4C88lUdSbhzgTlMreq/wqBh2bKFmLGUA8M9sSwd+2DMc6QQQN0DWCgDtSZq/3Fwr
Tc/0LWvHwh+/Tozx0kk0DxIQzS0BOsWHzjtOMwb1p3aLZXlG9FP5IrrPJlIyDOQK
feVB9JNH5kGUFEU0OkGaqPaQy1lTVTizIA/FmTqV9QeJo2+vKSNPt2Ys4dM5jevo
Tpp6ZAjvC6sAoIoWmvtazV5WOL7FwXwf8AQRJRgmh8JqHw4C2nZt9R+CNQqbGPa2
hxiTWi5EMufBmVOJeaEX867CUTLL6qzCtr/I2a2XZyMuPD5dQbS3l2MaYw1l2ucU
o9x6G78hBR32xagpqPCE
=LqzA
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Ian Barwick

On 19/06/14 11:58, Peter Geoghegan wrote:

On Wed, Jun 18, 2014 at 6:54 PM, Ian Barwick i...@2ndquadrant.com wrote:

I've just run into an index issue on 9.5 HEAD on a slave (master and slave
both compiled from 66802246e22d51858cd543877fcfddf24e6812f2); details
below (I have only found one index on the slave where the issue occurs so far).


Would you mind running my btreecheck tool on both systems? That might
shed some light on this. You can get it from:
http://www.postgresql.org/message-id/cam3swzrtv+xmrwlwq6c-x7czvwavfdwfi4st1zz4ddgfh4y...@mail.gmail.com
.

I suggest running bt_parent_index_verify() and bt_leftright_verify()
on all indexes on both systems. It shouldn't take too long.


Interesting, I'll take a look later.


Thanks

Ian Barwick

--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Tom Lane
Joe Conway m...@joeconway.com writes:
 On a side note, while perusing this section of code:

 8-- at dblink.c:1176 --
  /* make sure we have a persistent copy of the tupdesc */
  tupdesc = CreateTupleDescCopy(tupdesc);

 Shouldn't that CreateTupleDescCopy() happen in ecxt_per_query_memory?

Not necessary (we'd have seen crashes long since if it was).
ExecMakeTableFunctionResult doesn't need the tupdesc to persist past
return.

Actually, I was wondering whether we couldn't remove that
CreateTupleDescCopy call entirely.  The risk would be if
get_call_result_type returned a pointer into relcache or some other cached
tuple descriptor, which might be subject to a cache flush --- but AFAICS
it always returns a freshly created or copied tupdesc.  (This might not
have been true originally, which could explain why dblink thinks it needs
to copy.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Peter Geoghegan
On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick i...@2ndquadrant.com wrote:
 Interesting, I'll take a look later.

I'm pretty suspicious of incompatibilities that may exist between the
two sets of OS collations involved here. We aren't very clear on the
extent to which what you're doing is supported, but it's certainly the
case that bttextcmp()/varstr_cmp()/strcoll() return values must be
immutable between the two systems. Still, it should be possible to
determine if that's the problem using btreecheck.

Do you get perfectly consistent answers between the two when you ORDER BY login?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/18/2014 08:19 PM, Tom Lane wrote:
 Actually, I was wondering whether we couldn't remove that 
 CreateTupleDescCopy call entirely.

Apparently not, at least without some additional surgery.
ExecMakeTableFunctionResult() tries to free the tupledesc and segfaults.

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTolmPAAoJEDfy90M199hlz7YQAInPRKkGLskcqx4ujmsNpbEG
RS2zSll4PqCELa/wMcWDJUsoVkzjybuw6A/1MSLGtERIamHDcmDIbTFwx9Z02n0u
nuFGgyds9auIqn+AB18rvwgas+gL6tRHOUe4bagiuqNzywnOceW98PT0NttWt86y
Fsyz/xfapIoV+kS1k9FAXC5B/PfayYoPq3cB3qWGNX/vor+xIgw3BGT9Bk3DbN2J
IqD75HqoFV5jEwiStwxLaLvgqiLPVMzI/HdiQhprbveTa1e2vFM7Tu6n02i8uFVt
fVu4UCtBtOWRIC9oOO0QhVtqnETMpwsxwWIxC5RhWScL7M8CT3Z9cw5zCIJWo2Q8
VaUDa+gpXukisg8OUfK2AhSduxcPYJ8I+b/VMS9p6j5P/87slnLuMaTnxU7afVCM
3F2UrDOgv53t43NMeD3xou8J4ntf/NJbaOsXCQx9yXjcq3UMzT0g3OSwxPbfViE+
YN6GCelnt6e0mT3Uk/pZDm0+QwYeMv6ZHjYD3y7vD4+Ipnt/HNAhO6r/HyRDk7/x
DvSeO0okJXwUqTq/xcJ6wBE7frBqTpfzLxEMLXEVMMRCZWpKOAwK05afIZsadKqP
wgeJETiSirKfWUWb0/bmMIVv2vA4fRZYpLW31pBr6OLS1GlwsrsfuYNxCm8ur1tG
gUe/trrEIMBo6iyE//N5
=i7jE
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Ian Barwick

On 19/06/14 12:30, Peter Geoghegan wrote:

On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick i...@2ndquadrant.com wrote:

Interesting, I'll take a look later.


I'm pretty suspicious of incompatibilities that may exist between the
two sets of OS collations involved here. We aren't very clear on the
extent to which what you're doing is supported, but it's certainly the
case that bttextcmp()/varstr_cmp()/strcoll() return values must be
immutable between the two systems. Still, it should be possible to
determine if that's the problem using btreecheck.

Do you get perfectly consistent answers between the two when you ORDER BY login?


Hmm, nope, different sort order.


Regards

Ian Barwick

--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick i...@2ndquadrant.com wrote:
 Interesting, I'll take a look later.

 I'm pretty suspicious of incompatibilities that may exist between the
 two sets of OS collations involved here. We aren't very clear on the
 extent to which what you're doing is supported, but it's certainly the
 case that bttextcmp()/varstr_cmp()/strcoll() return values must be
 immutable between the two systems.

Oooh, I'll bet that's exactly it.  Is the database using UTF8 encoding and
a non-C locale?  It's well known that OS X's UTF8 locales sort nothing at
all like the supposedly equivalent locales on other systems.

 Still, it should be possible to
 determine if that's the problem using btreecheck.

Does btreecheck attempt to verify that the sort ordering of the index
matches the comparison behavior of the datatype?  That would (in general)
require calling user-defined code, which seems like probably a pretty
bad idea for the purposes btreecheck is being advertised for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Tom Lane
Joe Conway m...@joeconway.com writes:
 On 06/18/2014 08:19 PM, Tom Lane wrote:
 Actually, I was wondering whether we couldn't remove that 
 CreateTupleDescCopy call entirely.

 Apparently not, at least without some additional surgery.
 ExecMakeTableFunctionResult() tries to free the tupledesc and segfaults.

Hmm ... oh, I missed this bit:

/* We must get the tupledesc from call context */
if (rsinfo  IsA(rsinfo, ReturnSetInfo) 
rsinfo-expectedDesc != NULL)
{
result = TYPEFUNC_COMPOSITE;
if (resultTupleDesc)
*resultTupleDesc = rsinfo-expectedDesc;
/* Assume no polymorphic columns here, either */
}

So it's passing back the same tupdesc passed in by the caller of
ExecMakeTableFunctionResult.  We can free that all right, but the caller
won't be happy :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Peter Geoghegan
On Wed, Jun 18, 2014 at 8:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Still, it should be possible to
 determine if that's the problem using btreecheck.

 Does btreecheck attempt to verify that the sort ordering of the index
 matches the comparison behavior of the datatype?  That would (in general)
 require calling user-defined code, which seems like probably a pretty
 bad idea for the purposes btreecheck is being advertised for.


Yes, it does, but I see no alternative for a general-purpose tool, and
the fact that it is general purpose is of considerable value. I have
more or less invented my own weird index scans.

I assume you're referring to the field-verification of indexes use
case, which is not an immediate goal of btreecheck, even though it's
an important goal that there has already been some discussion of.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Tom Lane
Joe Conway m...@joeconway.com writes:
 On 06/18/2014 07:29 PM, Tom Lane wrote:
 With the attached patch on top of yours, I see no leak anymore.

 I can confirm that -- rock solid with 1 million iterations. I assume
 that should not be back-patched though?

Well, we usually think memory leaks are back-patchable bugs.  I'm
a bit worried about the potential performance impact of an extra
memory context creation/deletion though.  It's probably not noticeable in
this test case, but that's just because dblink() is such a spectacularly
expensive function.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/18/2014 08:45 PM, Tom Lane wrote:
 Joe Conway m...@joeconway.com writes:
 On 06/18/2014 07:29 PM, Tom Lane wrote:
 With the attached patch on top of yours, I see no leak
 anymore.
 
 I can confirm that -- rock solid with 1 million iterations. I
 assume that should not be back-patched though?
 
 Well, we usually think memory leaks are back-patchable bugs.  I'm a
 bit worried about the potential performance impact of an extra 
 memory context creation/deletion though.  It's probably not
 noticeable in this test case, but that's just because dblink() is
 such a spectacularly expensive function.

Probably so. I'll try to scrounge up some time to test the performance
impact of your patch.

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTol38AAoJEDfy90M199hleycP/2kOi2Pa6vcVXKxhNQo3mSdO
A84Ae/4LTfUbeVzUTf+uBRcz6LOtOlrHATZOcftJMlyTNmM++JJvF3YYMpGgmxO/
UfiykDs2bqDgPrIxbPxAEpgdeXWcsdJZzzOV1YWurU/qnTdoKD2ArPQhakWLGZH0
CRc46Cn2Qb3NCvnuO5R+ZhGPXS0t6EqTiGlmWtk9ZaI8MHmv1qVKMOKBor3v+2lk
/wdlc5lypPnZ07NKIjCVN0gzEJ+RV9nxQk1M3QkNYNsHOBiexEmaucXo6ab4derO
nXoOGo/0XwMhlhA6vrKlAKhxjkTNnJulVHQOWVLMCiNvcfX0KISJZVIoT/NraR94
Hc5ZZMjmhosbU8sgQiKjGoFSJq2Wld6SADuLt6xvsY9k5AiuEcPFbfVjAWlCIaEm
lOQ2cOrk+4nhEA1ygIsRw96GMT2WaEtOek4l3hJs6yd3zuzXoouO9i02QaXBqgR8
QmIJ+tOjwKnOPFThbJMjxlsrQMwJ6mPywhwt6E74YsKV6ndGFigBOfzjZlOn3OKX
DM60oWFhuCfHQdOlid1d6Uyuc4yeFb4g4XSS4sXW9wLPpvve63NxxBQ8ez0r3Up8
nLwcCqxFZRFwKX2Wp6fgtpmhgxolLF29XvpfTUMR6pPai+/Ei59vU4JXqqz0haqa
3UHpQ3AznN5vm+UvZJYe
=pvQS
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible index issue on 9.5 slave

2014-06-18 Thread Ian Barwick



On 19/06/14 12:35, Tom Lane wrote:

Peter Geoghegan p...@heroku.com writes:

On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick i...@2ndquadrant.com wrote:

Interesting, I'll take a look later.



I'm pretty suspicious of incompatibilities that may exist between the
two sets of OS collations involved here. We aren't very clear on the
extent to which what you're doing is supported, but it's certainly the
case that bttextcmp()/varstr_cmp()/strcoll() return values must be
immutable between the two systems.


Oooh, I'll bet that's exactly it.  Is the database using UTF8 encoding and
a non-C locale?


Yup, that is indeed the case.

 It's well known that OS X's UTF8 locales sort nothing at

all like the supposedly equivalent locales on other systems.


True, that. A different sort order wouldn't have surprised me,
but the failure to return an extant row had me thinking there
was something awry with the laptop causing file corruption (it's
getting on in years and has been bashed about a bit).


Regards

Ian Barwick


--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-18 Thread Kevin Grittner
David Fetter da...@fetter.org wrote:
 On Wed, Jun 18, 2014 at 03:30:34PM -0700, Kevin Grittner wrote:

 the more I think about it (and discuss it) the more I'm
 inclined to suffer the additional complexity of the standard
 syntax for specifying transition relations in order to avoid
 unnecessary overhead creating them when not needed.  I'm also
 leaning toward just storing TIDs in the tuplestores, even though
 it requires mixed snapshots in executing queries in the
 triggers.

 So in this case one tuplestore with two TIDs, either of which
 might be NULL?

No, one or two tuplestores, depending on need, each with TIDs of
either the before set or the after set of all tuples affected
by the DML statement, however many that may be.  More or less like
this first draft patch, except with TIDs instead of copies of the
rows, and with better selectivity about when the tuplestores are
generated.

 Having the before tuplestore available to a BEFORE STATEMENT
 trigger would make it possible to do things with the before
 transition table that are fragile and hacky now.

How do you propose to have an accurate before tuplestore of
affected rows before the scan runs and before the BEFORE ... FOR
EACH ROW triggers fire?  That would be particularly interesting to
try to generate if the scan involves evaluating any VOLATILE
functions.

 Again, I see the tuplestores as infrastructure both deltas and
 many other things, so long as they're attached to the right
 objects.  In my opinion, the right objects would include
 materialized views, triggers, and certain very specific kinds of
 DML of which all the RETURNING ones are one example.  They would
 not include the underlying tables.

Right now I've presented something for capturing the information
and allowing it to be accessed from triggers.  I don't think the
means of capturing it precludes passing it along to other
consumers.  I would like to get this part working before trying to
wire it up to anything other than triggers.  The best way to kill
an effort like this is to allow scope creep.  Now, if you think
that something fundamentally belongs at another level, that's
something to address -- but the point where we capture data to pass
to triggers seems like a natural and sound place to capture it for
other purposes.  And since almost all the code for this patch is in
trigger.c, this seems like I'm in the right place for a trigger
feature.

 standard syntax, the first thing would be for the statement to
 somehow communicate to the trigger layer the need to capture a
 tuplestore it might otherwise not generate, and there would need
 to be a way for the statement to access the needed
 tuplestore(s).

 Right.  Hence my proposal to make the existence of the
 tuplestores part of Query, writeable by the types of triggers
 which specify that they'll be needed.

 I just don't think that Rel is the place to connect them.

I don't know what you mean by that.  I've already said that I now
think we should use the standard CREATE TRIGGER syntax to specify
the transition tables, and that if we do that we don't need the
reloption that's in the patch.  If you ignore the 19 lines of new
code to add that reloption, absolutely 100% of the code changes in
the patch so far are in trigger.c and trigger.h.  That reloption
was never anything I would consider as *connecting* the tuplestores
to the Rel anyway -- it was simply an attempt to minimize
unnecessary work.  No matter how I try, I'm not seeing what you
mean by references to connecting to Rel.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Memory leak in dblink

2014-06-18 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/18/2014 12:14 PM, Joe Conway wrote:
 On 06/18/2014 12:09 PM, Tom Lane wrote:
 Joe Conway m...@joeconway.com writes:
 I think the context deletion was missed in the first place 
 because storeRow() is the wrong place to create the context. 
 Rather than creating the context in storeRow() and deleting it 
 two levels up in materializeQueryResult(), I think it should
 be created and deleted in the interim layer,
 storeQueryResult(). Patch along those lines attached.
 
 Since the storeInfo struct is longer-lived than 
 storeQueryResult(), it'd probably be better if the cleanup
 looked like
 
 +if (sinfo-tmpcontext != NULL) + 
 MemoryContextDelete(sinfo-tmpcontext); +sinfo-tmpcontext = 
 NULL;
 
 
 good point

If there is no further discussion, I will commit this version of the
patch back to 9.2 where this leak was introduced.

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTompDAAoJEDfy90M199hlZoMP/333eXbmFvh1WCowMehKPEy1
WV3VgyZx7hBvSr/cP/HUMjXTb34hRgi5uGa79ZMyAW+U+jDxrIN4ozxp54LlgU5x
pTzD8J8VviunvWzf6stgHTe5bfcBJ9kA9oZlgHApH+JRGeySsYALI4ZBluJa1tRa
gf6ePd8Kwl4AUucbM3v7kJGxtVS5XRGKcffJnATg+OLiBUReVv9ZCjxeYasyOaRt
K2Q8ijW878UgV9HViTCsl8THelnlh7q0BpbKaSZBJG847CgmrVxfRt/l8Od8a4G/
ODoQ/fV0ZOI3h5j9oirL4/RC9HWOqchJBvBd3CK7caWLwNKVwqqEWGV3uqEO2TnA
NH3QgHb4u8WGNhoWbwL6dGWa27s2EntlWLpJRuavKxCIN2owvVBKSZ6H8d3dSlI5
AqaGnxOPvxWEB5K2w0wBynkZ9nrk4PYuIVKADu8fqyYyDsG3wGsZmI1trLNXj5sm
uE/FTbdvUjcU2uIVMMJSbPxa5JvvfR/+9rM/AF8VP5PMnSs1CyeLQXkW7nazRZOP
7zHUY6N4vwem8tVQuuXPouLu2B/eTJoXaUTm7iQvXztJDmwKxKgYCzLW/LxfvI4Q
mDIY/Arh/4RdC7kVXu6m5BEisNIbBuKtcA6f0DM+4G9i0Wtft450fgajYV4xcic9
DMPyBMwD7csz3ssF04Ox
=PJyj
-END PGP SIGNATURE-
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index a81853f..2e43c8f 100644
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
*** storeQueryResult(storeInfo *sinfo, PGcon
*** 1076,1081 
--- 1076,1089 
  	if (!PQsetSingleRowMode(conn))		/* shouldn't fail */
  		elog(ERROR, failed to set single-row mode for dblink query);
  
+ 	/* Create short-lived memory context for data conversions */
+ 	if (!sinfo-tmpcontext)
+ 		sinfo-tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+   dblink temporary context,
+   ALLOCSET_DEFAULT_MINSIZE,
+   ALLOCSET_DEFAULT_INITSIZE,
+   ALLOCSET_DEFAULT_MAXSIZE);
+ 
  	for (;;)
  	{
  		CHECK_FOR_INTERRUPTS();
*** storeQueryResult(storeInfo *sinfo, PGcon
*** 1119,1124 
--- 1127,1137 
  	/* clean up GUC settings, if we changed any */
  	restoreLocalGucs(nestlevel);
  
+ 	/* clean up data conversion short-lived memory context */
+ 	if (sinfo-tmpcontext != NULL)
+ 		MemoryContextDelete(sinfo-tmpcontext);
+ 	sinfo-tmpcontext = NULL;
+ 
  	/* return last_res */
  	res = sinfo-last_res;
  	sinfo-last_res = NULL;
*** storeRow(storeInfo *sinfo, PGresult *res
*** 1204,1218 
  		if (sinfo-cstrs)
  			pfree(sinfo-cstrs);
  		sinfo-cstrs = (char **) palloc(nfields * sizeof(char *));
- 
- 		/* Create short-lived memory context for data conversions */
- 		if (!sinfo-tmpcontext)
- 			sinfo-tmpcontext =
- AllocSetContextCreate(CurrentMemoryContext,
- 	  dblink temporary context,
- 	  ALLOCSET_DEFAULT_MINSIZE,
- 	  ALLOCSET_DEFAULT_INITSIZE,
- 	  ALLOCSET_DEFAULT_MAXSIZE);
  	}
  
  	/* Should have a single-row result if we get here */
--- 1217,1222 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GIN pending list pages not recycled promptly (was Re: [HACKERS] GIN improvements part 1: additional information)

2014-06-18 Thread Amit Langote
On Wed, Jan 22, 2014 at 9:12 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 01/22/2014 03:39 AM, Tomas Vondra wrote:

 What annoys me a bit is the huge size difference between the index
 updated incrementally (by a sequence of INSERT commands), and the index
 rebuilt from scratch using VACUUM FULL. It's a bit better with the patch
 (2288 vs. 2035 MB), but is there a chance to improve this?


 Hmm. What seems to be happening is that pending item list pages that the
 fast update mechanism uses are not getting recycled. When enough list pages
 are filled up, they are flushed into the main index and the list pages are
 marked as deleted. But they are not recorded in the FSM, so they won't be
 recycled until the index is vacuumed. Almost all of the difference can be
 attributed to deleted pages left behind like that.

 So this isn't actually related to the packed postinglists patch at all. It
 just makes the bloat more obvious, because it makes the actual size of the
 index size, excluding deleted pages, smaller. But it can be observed on git
 master as well:

 I created a simple test table and index like this:

 create table foo (intarr int[]);
 create index i_foo on foo using gin(intarr) with (fastupdate=on);

 I filled the table like this:

 insert into foo select array[-1] from generate_series(1, 1000) g;

 postgres=# \d+i
List of relations
  Schema | Name | Type  | Owner  |  Size  | Description
 +--+---+++-
  public | foo  | table | heikki | 575 MB |
 (1 row)

 postgres=# \di+
List of relations
  Schema | Name  | Type  | Owner  | Table |  Size  | Description
 +---+---++---++-
  public | i_foo | index | heikki | foo   | 251 MB |
 (1 row)

 I wrote a little utility that scans all pages in a gin index, and prints out
 the flags indicating what kind of a page it is. The distribution looks like
 this:

  19 DATA
7420 DATA LEAF
   24701 DELETED
   1 LEAF
   1 META

 I think we need to add the deleted pages to the FSM more aggressively.

 I tried simply adding calls to RecordFreeIndexPage, after the list pages
 have been marked as deleted, but unfortunately that didn't help. The problem
 is that the FSM is organized into a three-level tree, and
 RecordFreeIndexPage only updates the bottom level. The upper levels are not
 updated until the FSM is vacuumed, so the pages are still not visible to
 GetFreeIndexPage calls until next vacuum. The simplest fix would be to add a
 call to IndexFreeSpaceMapVacuum after flushing the pending list, per
 attached patch. I'm slightly worried about the performance impact of the
 IndexFreeSpaceMapVacuum() call. It scans the whole FSM of the index, which
 isn't exactly free. So perhaps we should teach RecordFreeIndexPage to update
 the upper levels of the FSM in a retail-fashion instead.


I wonder if you pursued this further?

You recently added a number of TODO items related to GIN index; is it
worth adding this to the list?

--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    1   2