date:20141118


(2014/11/17 19:36), Ashutosh Bapat wrote:

Here are my comments about the patch fscan_reltargetlist.patch


Thanks for the review!


1. This isn't your change, but we might be able to get rid of assignment
2071 /* Are any system columns requested from rel? */
2072 scan_plan-fsSystemCol = false;

since make_foreignscan() already does that. But I will leave upto you to
remove this assignment or not.


As you pointed out, the assignment is redundant, but I think that that'd 
improve the clarity and readability.  So, I'd vote for leaving that as is.



2. Instead of using rel-reltargetlist, we should use the tlist passed
by caller. This is the tlist expected from the Plan node. For foreign
scans it will be same as rel-reltargetlist. Using tlist would make the
function consistent with create_*scan_plan functions.


I disagree with that for the reasons mentioned below:

* For a foreign scan, tlist is *not* necessarily the same as 
rel-reltargetlist (ie, there is a case where tlist contains all user 
attributes while rel-reltargetlist contains only attributes actually 
needed by the query).  In such a case it'd be inefficient to use tlist 
rather than rel-reltargetlist.


* I think that it'd improve the readability to match the code with other 
places that execute similar processing, such as check_index_only() and 
remove_unused_subquery_outputs().


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw behaves oddly


(2014/11/17 19:54), Ashutosh Bapat wrote:

Here are comments for postgres_fdw-syscol patch.


Thanks for the review!


Code
---
1. Instead of a single liner comment System columns can't be sent to
remote., it might be better to explain why system columns can't be sent
to the remote.


Done.


2. The comment in deparseVar is single line comment, so it should start
and end on the same line i.e. /* and */ should be on the same line.


Done.


3. Since there is already a testcase which triggered this particular
change, it will good, if we add that to regression in postgres_fdw.


Done.

Please find attached an updated version of the patch.

Thanks,

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 253,258  foreign_expr_walker(Node *node,
--- 253,268 
  if (var-varno == glob_cxt-foreignrel-relid 
  	var-varlevelsup == 0)
  {
+ 	/*
+ 	 * System columns can't be sent to remote since the values
+ 	 * for system columns might be different between local and
+ 	 * remote.
+ 	 *
+ 	 * XXX: we should probably send ctid to remote.
+ 	 */
+ 	if (var-varattno  0)
+ 		return false;
+ 
  	/* Var belongs to foreign table */
  	collation = var-varcollid;
  	state = OidIsValid(collation) ? FDW_COLLATE_SAFE : FDW_COLLATE_NONE;
***
*** 1262,1267  deparseVar(Var *node, deparse_expr_cxt *context)
--- 1272,1280 
  	if (node-varno == context-foreignrel-relid 
  		node-varlevelsup == 0)
  	{
+ 		/* Var shouldn't be a system column */
+ 		Assert(node-varattno = 0);
+ 
  		/* Var belongs to foreign table */
  		deparseColumnRef(buf, node-varno, node-varattno, context-root);
  	}
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 353,358  EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
--- 353,368 
 Remote SQL: SELECT C 1, c2, c3, c4, c5, c6, c7, c8 FROM S 1.T 1 WHERE ((C 1 = c2))
  (3 rows)
  
+ -- where with system column conditions
+ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.tableoid != 0;
+QUERY PLAN
+ -
+  Foreign Scan on public.ft1 t1
+Output: c1, c2, c3, c4, c5, c6, c7, c8
+Filter: (t1.tableoid  0::oid)
+Remote SQL: SELECT C 1, c2, c3, c4, c5, c6, c7, c8 FROM S 1.T 1
+ (4 rows)
+ 
  -- ===
  -- WHERE with remotely-executable conditions
  -- ===
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***
*** 180,185  EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = postgres_fdw_a
--- 180,187 
  EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 === t1.c2;
  EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = abs(t1.c2);
  EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.c1 = t1.c2;
+ -- where with system column conditions
+ EXPLAIN (VERBOSE, COSTS false) SELECT * FROM ft1 t1 WHERE t1.tableoid != 0;
  
  -- ===
  -- WHERE with remotely-executable conditions

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)

On Tue, Nov 18, 2014 at 4:31 AM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:
 WAL insertion performance
 =
 To measure the performance of generating WAL, I ran the
 wal-update-testsuite.sh that Amit also ran earlier. The cluster was
 initialized with:

 shared_buffers=512MB
 checkpoint_segments=30
 fsync=off
 autovacuum=off
 full_page_writes=off

 [results]
 Summary: No meaningful difference in runtime.
If I am seeing that correctly, WAL generated is reduced for all the tests,
except for the case of hundred tiny fields where more WAL is generated.
Now the duration time seems to be generally reduced, some noise (?) making
it sometimes higher.

 WAL replay performance
 ==

 To test WAL replay performance, I ran pgbench with WAL archiving enabled,
 and timed the replay of the generated WAL. I used the attached script,
 replay-perf-test.sh for that. full_page_writes were disabled, because
 replaying full page images is quite different from replaying other
records.
 (Performance of full-page images is interesting too, but it's not expected
 that these WAL format changes make much difference to that).

 In summary, there is no significant difference in replay performance. The
 amount of WAL generated is much smaller with the patch.

 This concludes my performance testing, until someone wants to see some
other
 scenario being tested. I'm happy with the results.
I think you can, that's a great study, and this proves to be a gain on many
fields.

If this goes in, it is going to be one of the largest patches committed
ever.
$ git diff --stat | tail -n1
91 files changed, 3895 insertions(+), 4305 deletions(-)

There are still some XXX blocks here and there in the code.. But nothing
really huge, like here:
-* checks could go wrong too.
+* checks could go wrong too. XXX does this comment still
make sense?
 */
-   Assert(xldata-blkno != xldata-blknoNew);
+   Assert(blkno != blknoNew);

Btw, did you do a run with the buffer capture facility and checked for page
differences?

Except that, after going through the code once again, ISTM that the patch
is in a nice state. It may be better to wait for some input from Andres, he
may catch some issues I haven't spotted.
Regards,
-- 
Michael

Re: [HACKERS] Incorrect initialization of sentPtr in walsender.c

On Thu, Oct 16, 2014 at 9:01 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Thu, Oct 16, 2014 at 7:09 PM, Simon Riggs si...@2ndquadrant.com
 wrote:

 I find it confusing that the Lowest pointer value is also Invalid.
 Valid != Invalid

 In complement to that, note that I mentioned Invalid should be UINT_MAX
 for clarity.

Worth noticing that this patch has been marked as returned with feedback.
-- 
Michael

Re: [HACKERS] WAL format and API changes (9.5)


On 11/18/2014 10:28 AM, Michael Paquier wrote:

Btw, did you do a run with the buffer capture facility and checked for page
differences?


Yes. That's how I bumped into the bug fixed in c73669c0. That tool has 
been tremendously helpful.



Except that, after going through the code once again, ISTM that the patch
is in a nice state. It may be better to wait for some input from Andres, he
may catch some issues I haven't spotted.


Yeah, I'm sure there are tons of small things. I'll have to take a small 
break and read through the patch myself after a few days. But I'd like 
to get this committed pretty soon. I believe everyone agrees with the 
big picture, even if there's some little tweaking left to do. It's a 
pain to maintain as a patch, and I believe the FPW compression patch is 
waiting for this to land.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

2014-11-18 Thread Alban Hertroys

On 17 November 2014 22:27, Tom Lane t...@sss.pgh.pa.us wrote:
 Another idea that occurred to me is to run a planning cycle in which the
 actual parameter values are made available to the planner, but as
 estimates not hard constants (this facility already exists, it's just not
 being used by plancache.c).  This would yield cost estimates that are more
 safely comparable to the custom plan.  But I'm not sure that we'd want to
 expend yet another planning cycle to do this, nor am I sure that we'd want
 to use such a plan as The Generic Plan.

 regards, tom lane

Perhaps a somewhat naive idea, I only have the broad picture of how
the query planner works, but...
What if prepared statements would not store an entirely pinned down
version of the query plan, but instead stores a smashed down version
of the query plan that still leaves room for choosing some different
paths based on key decision criteria?

For example, if an input parameter value matches the most common
values, choose the sequential scan path (as in the OP's case, IIRC)
and if it's not, attempt an index scan.
Of course, one implication of doing this is likely a decrease in
planning performance (which matters for simple queries), but if it
results in better plan choices for complex queries that may be a nett
gain.

I recently followed an introductory class about neural networks and
the decision logic seems to look like the neuron/perceptron pattern.

I'm just throwing this out here in case it's a viable option and
nobody else in the world thought of this, however unlikely ;)

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inherit support for foreign tables

2014-11-18 Thread Ashutosh Bapat

On Mon, Nov 17, 2014 at 1:25 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
wrote:

 (2014/11/12 20:04), Ashutosh Bapat wrote:

 I reviewed fdw-chk-3 patch. Here are my comments


 Thanks for the review!

  Tests
 ---
 1. The tests added in file_fdw module look good. We should add tests for
 CREATE TABLE with CHECK CONSTRAINT also.


 Agreed.  I added the tests, and also changed the proposed tests a bit.

  2.  For postgres_fdw we need tests to check the behaviour in case the
 constraints mismatch between the remote table and its local foreign
 table declaration in case of INSERT, UPDATE and SELECT.


 Done.

  3. In the testcases for postgres_fdw it seems that you have forgotten to
 add statement after SET constraint_exclusion to 'partition'


 I added the statement.

  4. In test foreign_data there are changes to fix the diffs caused by
 these changes like below
   ALTER FOREIGN TABLE ft1 DROP CONSTRAINT no_const;   -- ERROR
 -ERROR:  ft1 is not a table
 +ERROR:  constraint no_const of relation ft1 does not exist
   ALTER FOREIGN TABLE ft1 DROP CONSTRAINT IF EXISTS no_const;
 -ERROR:  ft1 is not a table
 +NOTICE:  constraint no_const of relation ft1 does not exist, skipping
   ALTER FOREIGN TABLE ft1 DROP CONSTRAINT ft1_c1_check;
 -ERROR:  ft1 is not a table
 +ERROR:  constraint ft1_c1_check of relation ft1 does not exist


  Earlier when constraints were not supported for FOREIGN TABLE, these
 tests made sure the same functionality. So, even though the
 corresponding constraints were not created on the table (in fact it
 didn't allow the creation as well). Now that the constraints are
 allowed, I think the tests for no_const (without IF EXISTS) and
 ft1_c1_check are duplicating the same testcase. May be we should
 review this set of statement in the light of new functionality.


 Agreed.  I removed the DROP CONSTRAINT ft1_c1_check test, and added a
 new test for ALTER CONSTRAINT.

  Code and implementation
 --
 The usage of NO INHERIT and NOT VALID with CONSTRAINT on foreign table
 is blocked, but corresponding documentation entry doesn't mention so.
 Since foreign tables can not be inherited NO INHERIT option isn't
 applicable to foreign tables and the constraints on the foreign tables
 are declarative, hence NOT VALID option is also not applicable. So, I
 agree with what the code is doing, but that should be reflected in
 documentation with this explanation.
 Rest of the code modifies the condition checks for CHECK CONSTRAINTs on
 foreign tables, and it looks good to me.


 Done.

 Other changes:
 * Modified one error message that I added in AddRelationNewConstraints, to
 match the other message there.
 * Revised other docs a little bit.

 Attached is an updated version of the patch.


I looked at the patch. It looks good now. Once we have the inh patch ready,
we can mark this item as ready for commiter.


 Thanks,

 Best regards,
 Etsuro Fujita




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] proposal: plpgsql - Assert statement

On 18 November 2014 01:00, Jim Nasby jim.na...@bluetreble.com wrote:
 On 11/17/14, 4:58 PM, Simon Riggs wrote:

 Great, looks good to me, marking as ready for committer.


 What is wrong with using IF ?


 It's a hell of a lot wordier. I've previously created a more sophisticated
 assert framework to allow more control over things, but ended up also
 using it just for simple sanity checking because it was much nicer than
 typeing IF THEN RAISE ERROR END IF.

Why is that not a requirement for a less wordier form of IF?

IF (something) THEN action

Why is this problem specific to RAISE?


-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

On 31 October 2014 15:18, Petr Jelinek p...@2ndquadrant.com wrote:

 Attached is the v2 of the patch with the review comments addressed (see
 below).
...
 Done, there is now action_at_recovery_target which can be set to either
 pause, continue or shutdown, defaulting to pause (which is same as old
 behavior of pause_at_recovery_target defaulting to true).

One comment only: I think the actions should be called: pause, promote
and shutdown, since continue leads immediately to promotion of the
server.

I'm good with this patch otherwise. Barring objections I will commit tomorrow.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [TODO] Track number of files ready to be archived in pg_stat_archiver

On 18 November 2014 06:20, Michael Paquier michael.paqu...@gmail.com wrote:

 the DBA may want to know how long is the queue of WAL files
 waiting to be archived.

Agreed

 That's IMO something we simply forgot in the
 first implementation of pg_stat_archiver

That's not how it appears to me. ISTM that the information requested
is already available, it just needs some minor calculations to work
out how many files are required.

 the most direct way to
 know that is to count the .ready files in archive_status.

...my earlier point was...

 pg_stat_archiver already has a column for last_archived_wal and
 last_failed_wal, so you can already work out how many files there must
 be between then and now. Perhaps that can be added directly to the
 view, to assist the user in calculating it. Reading the directory
 itself to count the file is unnecessary, except as a diagnostic.

As soon as we have sent the first file, we will know the queue length
at any point afterwards.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: plpgsql - Assert statement

2014-11-18 10:23 GMT+01:00 Simon Riggs si...@2ndquadrant.com:

 On 18 November 2014 01:00, Jim Nasby jim.na...@bluetreble.com wrote:
  On 11/17/14, 4:58 PM, Simon Riggs wrote:
 
  Great, looks good to me, marking as ready for committer.
 
 
  What is wrong with using IF ?
 
 
  It's a hell of a lot wordier. I've previously created a more
 sophisticated
  assert framework to allow more control over things, but ended up also
  using it just for simple sanity checking because it was much nicer than
  typeing IF THEN RAISE ERROR END IF.

 Why is that not a requirement for a less wordier form of IF?

 IF (something) THEN action


statement IF is a control statement - and syntax, pattern for control
statements in plpgsql is consistent. I don't want to break it (more,
probably it is hardly implemented due problems in bison). PL/pgSQL, PL/SQL,
Ada are well designed (in my opinion). Conditional statement has precedent
in PL/pgSQL now. We support EXIT and CONTINUE WHEN, so we don't propose a
new pattern, only reuse some existing.

Regards

Pavel





 Why is this problem specific to RAISE?


 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

Re: [HACKERS] alternative model for handling locking in parallel groups

2014-11-18 Thread Amit Kapila

On Fri, Nov 14, 2014 at 2:29 AM, Robert Haas robertmh...@gmail.com wrote:

Discussion of my incomplete group locking patch seems to have
converged around two points: (1) Everybody agrees that undetected
deadlocks are unacceptable. (2) Nobody agrees with my proposal to
treat locks held by group members as mutually non-conflicting. As was
probably evident from the emails on the other thread, it was not
initially clear to me why you'd EVER want heavyweight locks held by
different group members to mutually conflict, but after thinking it
over for a while, I started to think of cases where you would
definitely want that:

1. Suppose two or more parallel workers are doing a parallel COPY.
Each time the relation needs to be extended, one backend or the other
will need to take the relation extension lock in Exclusive mode.
Clearly, taking this lock needs to exclude both workers in the same
group and also unrelated processes.

2. Suppose two or more parallel workers are doing a parallel
sequential scan, with a filter condition of myfunc(a.somecol), and
that myfunc(a.somecal) updates a tuple in some other table. Access to
update that tuple will be serialized using tuple locks, and it's no
safer for two parallel workers to do this at the same time than it
would be for two unrelated processes to do it at the same time.

Won't this be addressed because both updates issued from myfunc()
are considered as separate commands, so w.r.t lock it should behave
as 2 different updates in same transaction. I think there may be more
things to make updates possible via parallel workers apart from tuple lock.

On the other hand, I think there are also some cases where you pretty
clearly DO want the locks among group members to be mutually
non-conflicting, such as:

3. Parallel VACUUM. VACUUM takes ShareUpdateExclusiveLock, so that
only one process can be vacuuming a relation at the same time. Now,
if you've got several processes in a group that are collaborating to
vacuum that relation, they clearly need to avoid excluding each other,
but they still need to exclude other people. And in particular,
nobody else should get to start vacuuming that relation until the last
member of the group exits. So what you want is a
ShareUpdateExclusiveLock that is, in effect, shared across the whole
group, so that it's only released when the last process exits.

4. Parallel query on a locked relation. Parallel query should work on
a table created in the current transaction, or one explicitly locked
by user action. It's not acceptable for that to just randomly
deadlock, and skipping parallelism altogether, while it'd probably be
acceptable for a first version, is not going a good long-term
solution. It also sounds buggy and fragile for the query planner to
try to guess whether the lock requests in the parallel workers will
succeed or fail when issued. Figuring such details out is the job of
the lock manager or the parallelism infrastructure, not the query
planner.

After thinking about these cases for a bit, I came up with a new
possible approach to this problem. Suppose that, at the beginning of
parallelism, when we decide to start up workers, we grant all of the
locks already held by the master to each worker (ignoring the normal
rules for lock conflicts). Thereafter, we do everything the same as
now, with no changes to the deadlock detector. That allows the lock
conflicts to happen normally in the first two cases above, while
preventing the unwanted lock conflicts in the second two cases.

Here I think we have to consider how to pass the information about
all the locks held by master to worker backends. Also I think assuming
we have such an information available, still it will be considerable work
to grant locks considering the number of locks we acquire [1] (based
on Simon's analysis) and the additional memory they require. Finally
I think deadlock detector work might also be increased as there will be
now more procs to visit.

In general, I think this scheme will work, but I am not sure it is worth
at this stage (considering initial goal to make parallel workers will be
used for read operations).

[1] :
http://www.postgresql.org/message-id/ca+u5nmjlubgduwjqikt6umqrfmrmrqdhpndb6z5xzdtb0ph...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-18 Thread Amit Kapila

On Tue, Nov 18, 2014 at 3:10 PM, Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2014-11-17 at 14:32 -0500, Robert Haas wrote:

  To reiterate the basic problem here, if we do nothing at all about the
  lock manager, a parallel backend can stall trying to grab an
  AccessExclusiveLock that the user backend alread holds, either because
  the user backend holds an AccessExclusiveLock as well, or because some
  other process is waiting for one, we'll deadlock and not notice.

 My feeling is that we should keep the concept of a group and group
 leader from your patch, and improve the deadlock detector to make use of
 that information (looking at the code, it looks doable but not trivial).
 But unless I am missing something, we should separate out the lock
 sharing, and defer that until later.


+1 to initially have something like you describe and may be an additional
mechanism to grant the non-conflicting locks which are already held by
master backend to child backends.  I understand that allowing additional
non-conflicting locks could lead to some problem if write operations are
allowed via child backends as is described as point 1 in Robert's another
mail [1].  However I think as initially we want to allow read only
operations
via child backends, this should be okay and later on if we want to
support write via child backends, we might want to consider having an
additional property with lock which will allow lock manager or deadlock
detector
to consider them separately, so that those locks won't be granted to
child backends if there is conflict of same with parent.


[1]:
http://www.postgresql.org/message-id/ca+tgmoygplojo+lg1bebos8gdjwjtq0qdmxsyj4ihfyj11t...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-18 Thread Alvaro Herrera

Amit Kapila wrote:
 On Sun, Nov 16, 2014 at 6:15 AM, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:

  I think the fact that we use symlinks is an implementation detail;
  aren't them actually junction points, not symlinks, in some Windows
  cases?
 
 Right, but they provide same functionality as symlinks and now we
 are even planing to provide this feature for both linux and windows as
 both Tom and Robert seems to feel, it's better that way.  Anyhow,
 I think naming any entity generally differs based on individual's
 perspective, so we can go with the name which appeals to more people.
 In case, nobody else has any preference, I will change it to what both
 of us can agree upon (either 'tablespace catalog', 'tablespace_info' ...).

Well, I have made my argument.  Since you're the submitter, feel free to
select what you think is the best name.


One use case mentioned upthread is having the clone be created in the
same machine as the source server.  Does your proposal help with it?

 That use case got addressed with -T option with which user can relocate
 tablespace directory (Commit: fb05f3c;  pg_basebackup: Add support for
 relocating tablespaces)

Okay.  As far as I know, -T only works for plain mode, right?  I wonder
if we should make -T modify the tablespace catalog, so that the
resulting file in tar output outputs names mangled per the map; that
would make -T work in tar mode too.  Does that make sense?  (Maybe it
already works that way?  I didn't research.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] BRIN and PageIndexDeleteNoCompact

BRIN added a new function, PageIndexDeleteNoCompact(), which is like 
PageIndexMultiDelete but does not remove unused line pointers. However, 
all the callers pass it just a single offset. So the callers would 
really be more interested in efficiently squeezing out a single tuple 
from a page, like PageIndexTupleDelete() does, than a bulk operation.


PageIndexDeleteNoCompact() is not optimal for the single item case. Note 
that PageIndexMultiDelete() first checks if if the number of removed 
items is small (= 2), and just calls PageIndexTupleDelete in a loop in 
that case.


How about replacing PageIndexDeleteNoCompact() with something more like 
PageIndexTupleDelete()? It ought to be both faster and simpler.


PS. The comment above PageIndexDeleteNoCompact says that unused items at 
the end of the array are removed. But looking at the code, I don't see 
it doing that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: plpgsql - Assert statement



On 11/18/2014 04:23 AM, Simon Riggs wrote:

On 18 November 2014 01:00, Jim Nasby jim.na...@bluetreble.com wrote:

On 11/17/14, 4:58 PM, Simon Riggs wrote:

Great, looks good to me, marking as ready for committer.


What is wrong with using IF ?


It's a hell of a lot wordier. I've previously created a more sophisticated
assert framework to allow more control over things, but ended up also
using it just for simple sanity checking because it was much nicer than
typeing IF THEN RAISE ERROR END IF.

Why is that not a requirement for a less wordier form of IF?

IF (something) THEN action

Why is this problem specific to RAISE?





Please, no. The use of closed form rather than open form IF statements 
is one of the things Ada (and by inheritance PLPGSQL) got right.


Frankly, I find this whole proposal, and all the suggested alternatives, 
somewhat ill-conceived. PLPGSQL is a wordy language. If you want 
something more terse, use something else. Adding these sorts of 
syntactic sugar warts onto the language doesn't seem like a terribly 
good way to proceed.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : REINDEX SCHEMA

On 23 October 2014 00:21, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Attached patch is latest version patch I modified above.
 Also, I noticed I had forgotten to add the patch regarding document of
 reindexdb.

 Please don't use pg_catalog in the regression test.  That way we will
 need to update the expected file whenever a new catalog is added, which
 seems pointless.  Maybe create a schema with a couple of tables
 specifically for this, instead.

These patches look fine to me. Don't see anybody objecting either.

Are you looking to commit them, or shall I?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: plpgsql - Assert statement

2014-11-18 17:08 GMT+01:00 Simon Riggs si...@2ndquadrant.com:

 On 18 November 2014 12:29, Pavel Stehule pavel.steh...@gmail.com wrote:

  Why is that not a requirement for a less wordier form of IF?
 
  IF (something) THEN action
 
 
  statement IF is a control statement - and syntax, pattern for control
  statements in plpgsql is consistent. I don't want to break it (more,
  probably it is hardly implemented due problems in bison). PL/pgSQL,
 PL/SQL,
  Ada are well designed (in my opinion). Conditional statement has
 precedent
  in PL/pgSQL now. We support EXIT and CONTINUE WHEN, so we don't propose a
  new pattern, only reuse some existing.

 I commend your wish to improve PL/pgSQL, I'm sorry to say that I just
 don't see how this moves us forwards.


It is not big step, but it open some doors


 What this does is introduce a fairly restricted new feature that
 removes backwards compatibility and takes us further away from Oracle
 compatibility.


It is not valid argument for this use case. RAISE statement is not
compatible with Oracle long time. WHEN clause change nothing.



 If I want to write an Assert style test that fits on a single line, just
 write
PEFORM raise_error_when(boolean expression);


it is possibility too. But a) it is limited little bit, b) we didn't find a
agreement how to design it for upstream. c) I am thinking so there is a
space for enhancing RAISE statement for other use cases - tracing, global
condition assertions etc



 which requires a very short function like this
 CREATE OR REPLACE FUNCTION raise_error_when(test boolean) returns void
 language plpgsql
 AS $$
   DECLARE
   BEGIN
  IF (test) THEN
   RAISE EXCEPTION 'assertion failure';
  END IF;
   END;
 $$;

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

[HACKERS] PageRepairFragmentation performance

When profiling replay the WAL generated by pgbench, I noticed the 
PageRepairFragmentation consumes a large fraction of the CPU time:


Per perf report:

+   33.44% 6.79%  postmaster  postgres[.] 
PageRepairFragmentation


The 33.44% figure includes all the functions called by 
PageRepairFragmentation. Looking at the profile closer, most of that 
time seems to be spent in sorting the item ids to physical order, so 
that they can be memmoved in place:


+   83.86% 0.00%  postmaster  libc-2.19.so[.] __libc_start_main
+   83.86% 0.00%  postmaster  postgres[.] main
+   83.86% 0.00%  postmaster  postgres[.] PostmasterMain
+   83.86% 0.00%  postmaster  postgres[.] 0x0023208d
+   83.86% 0.00%  postmaster  postgres[.] 
AuxiliaryProcessMain

+   83.86% 0.00%  postmaster  postgres[.] StartupProcessMain
+   83.63% 1.86%  postmaster  postgres[.] StartupXLOG
+   45.85% 0.10%  postmaster  postgres[.] heap2_redo
+   33.44% 6.79%  postmaster  postgres[.] 
PageRepairFragmentation

+   24.60%16.63%  postmaster  postgres[.] pg_qsort
+   18.04% 0.23%  postmaster  postgres[.] heap_redo
+   17.07% 1.53%  postmaster  postgres[.] 
XLogReadBufferExtended
+   16.20% 0.30%  postmaster  postgres[.] 
XLogReadBufferForRedoEx

+   14.38% 0.31%  postmaster  postgres[.] ReadRecord
+   13.90% 1.29%  postmaster  postgres[.] XLogReadRecord
+   12.40% 1.54%  postmaster  postgres[.] heap_xlog_update
+   12.08%12.06%  postmaster  postgres[.] ValidXLogRecord
+   11.73% 0.10%  postmaster  postgres[.] 
XLogReadBufferForRedo
+   10.89% 0.27%  postmaster  postgres[.] 
ReadBufferWithoutRelcac

+8.49% 1.07%  postmaster  libc-2.19.so[.] __GI___libc_read
+7.61% 0.71%  postmaster  postgres[.] ReadBuffer_common
+5.64% 0.48%  postmaster  postgres[.] smgropen
+5.48% 5.47%  postmaster  postgres[.] itemoffcompare
+5.40% 5.38%  postmaster  postgres[.] 
hash_search_with_hash_v
+4.70% 4.69%  postmaster  libc-2.19.so[.] 
__memmove_ssse3_back
+4.30% 0.77%  postmaster  libc-2.19.so[.] 
__GI___libc_lseek64

+4.29% 0.20%  postmaster  postgres[.] heap_xlog_insert
+3.88% 3.87%  postmaster  postgres[.] swapfunc
+2.81% 0.09%  postmaster  postgres[.] 
XLogRecordPageWithFreeS

+2.76% 0.00%  cp  libc-2.19.so[.] __GI___libc_write
+2.68% 0.07%  postmaster  postgres[.] BufTableLookup
+2.58% 2.58%  postmaster  postgres[.] LWLockAcquire
+2.17% 0.14%  postmaster  postgres[.] tag_hash

So there's clearly some room for improvement here. A couple of ideas:

1. Replace the qsort with something cheaper. The itemid arrays being 
sorted are small, a few hundred item at most, usually even smaller. In 
this pgbench test case I used, the typical size is about 60. With a 
small array a plain insertion sort is cheaper than the generic qsort(), 
because it can avoid the function overhead etc. involved with generic 
qsort. Or we could use something smarter, like a radix sort, knowing 
that we're sorting small integers. Or we could implement an inlineable 
version of qsort and use that.


2. Instead of sorting the array and using memmove in-place, we could 
copy all the tuples to a temporary buffer in arbitrary order, and 
finally copy the temporary copy back to the buffer. That requires two 
memory copies per tuple, instead of one memmove, but memcpy() is pretty 
darn fast. It would be a loss when there are only a few large tuples on 
the page, so that avoiding the sort doesn't help, or when the tuples are 
mostly already in the correct places, so that most of the memmove()s are 
no-ops. But with a lot of small tuples, it would be a win, and it would 
be simple.


The second option would change behaviour slightly, as the tuples would 
be placed on the page in different physical order than before. It 
wouldn't be visible to to users, though.


I spent some time hacking approach 1, and replaced the qsort() call with 
a bucket sort. I'm not sure if a bucket sort is optimal, or better than 
a specialized quicksort implementation, but it seemed simple.


With the testcase I've been using - replaying about 2GB of WAL generated 
by pgbench - this reduces the replay time from about 55 s to 45 s.


Thoughts? Attached is the patch I put together. It's actually two 
patches: the first is just refactoring, putting the common code between 
PageRepairFragmentation, PageIndexMultiDelete, and 
PageIndexDeleteNoCompact to function. The second replaces the qsort().


- Heikki
From 268e2cce8db21c873f10e62a308ff3b20826ecd7 Mon

Re: [HACKERS] PageRepairFragmentation performance

2014-11-18 Thread José Luis Tallón


On 11/18/2014 07:03 PM, Heikki Linnakangas wrote:
When profiling replay the WAL generated by pgbench, I noticed the 
PageRepairFragmentation consumes a large fraction of the CPU time:

[snip]

1. Replace the qsort with something cheaper. The itemid arrays being 
sorted are small, a few hundred item at most, usually even smaller. In 
this pgbench test case I used, the typical size is about 60. With a 
small array a plain insertion sort is cheaper than the generic 
qsort(), because it can avoid the function overhead etc. involved with 
generic qsort. Or we could use something smarter, like a radix sort, 
knowing that we're sorting small integers. Or we could implement an 
inlineable version of qsort and use that.
IIRC, we would have a theoretical complexity of quicksort and radix sort 
should be approximately the same for 256-1024 items... O(n*log(n)) vs 
O(d*n), where d is ~log2(n) or just 16 in this case. However, 
lexicographical (bitstring-wise ordering) might not be what we are 
aiming for here


AFAIK, an inlined quicksort should be about the best performing sort 
available (most of the enhancement coming from staying within the I-cache)


2. Instead of sorting the array and using memmove in-place, we could 
copy all the tuples to a temporary buffer in arbitrary order, and 
finally copy the temporary copy back to the buffer. That requires two 
memory copies per tuple, instead of one memmove, but memcpy() is 
pretty darn fast. It would be a loss when there are only a few large 
tuples on the page, so that avoiding the sort doesn't help, or when 
the tuples are mostly already in the correct places, so that most of 
the memmove()s are no-ops. But with a lot of small tuples, it would be 
a win, and it would be simple.


Memmove *should* be no slower than memcpy if both are actually 
translated by the compiler to use intrinsics as opposed to calling the 
functions --- as it seems to be done here (cfr. __memmove_ssse3_back )
A simple if in order to eliminate the no-op memmoves might as well do 
it, too.


Just my two (euro) cents, though

The second option would change behaviour slightly, as the tuples would 
be placed on the page in different physical order than before. It 
wouldn't be visible to to users, though.


I spent some time hacking approach 1, and replaced the qsort() call 
with a bucket sort. I'm not sure if a bucket sort is optimal, or 
better than a specialized quicksort implementation, but it seemed simple.


With the testcase I've been using - replaying about 2GB of WAL 
generated by pgbench - this reduces the replay time from about 55 s to 
45 s.


Not bad at all... though I suspect most of it might come from staying 
within the I-cache as opposed to regular qsort.

The smaller itemIdSortData structure surely helps a bit, too :)

Thoughts? Attached is the patch I put together. It's actually two 
patches: the first is just refactoring, putting the common code 
between PageRepairFragmentation, PageIndexMultiDelete, and 
PageIndexDeleteNoCompact to function. The second replaces the qsort(). 


Definitively worth-while, even if just for the refactor. The speed-up 
sounds very good, too.




Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_locks doesn't check for interrupts?

2014-11-18 Thread Josh Berkus

Hackers,

Since querying pg_locks can be intrusive due to needing to lock the lock
partitions, when I'm collecting data about locks I generally put a
statement_timeout on it.  However, I'm noticing that this
statement_timeout appears to be completely ignored; I've seen this query
run for up to 10 minutes* when the database is heavily loaded.  This it
seems likely to me that the functions under pg_locks aren't checking for
interrupts.  Anybody checked this already?

(yes, when a 64,000 item lock table is mostly full of locks, queries
against pg_locks *can* take 10 minutes)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_locks doesn't check for interrupts?

Josh Berkus j...@agliodbs.com writes:
 Since querying pg_locks can be intrusive due to needing to lock the lock
 partitions, when I'm collecting data about locks I generally put a
 statement_timeout on it.  However, I'm noticing that this
 statement_timeout appears to be completely ignored; I've seen this query
 run for up to 10 minutes* when the database is heavily loaded.  This it
 seems likely to me that the functions under pg_locks aren't checking for
 interrupts.  Anybody checked this already?

Whether they do or not, I don't think we allow CHECK_FOR_INTERRUPTS to
trigger while holding an LWLock.  So this would not be a trivial thing
to fix.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_locks doesn't check for interrupts?

2014-11-18 Thread Josh Berkus

On 11/18/2014 10:47 AM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 Since querying pg_locks can be intrusive due to needing to lock the lock
 partitions, when I'm collecting data about locks I generally put a
 statement_timeout on it.  However, I'm noticing that this
 statement_timeout appears to be completely ignored; I've seen this query
 run for up to 10 minutes* when the database is heavily loaded.  This it
 seems likely to me that the functions under pg_locks aren't checking for
 interrupts.  Anybody checked this already?
 
 Whether they do or not, I don't think we allow CHECK_FOR_INTERRUPTS to
 trigger while holding an LWLock.  So this would not be a trivial thing
 to fix.

Hmm.  So the basic problem is that querying pg_locks itself can make an
already bad locking situation worse (I've seen it contribute to a total
lockup, which didn't resolve until I terminated the query against
pg_locks).  I don't see a clear way to make it less dangerous, so I was
hoping that at least making it time out made it safer to use.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN and PageIndexDeleteNoCompact

2014-11-18 Thread Alvaro Herrera

Heikki Linnakangas wrote:
 BRIN added a new function, PageIndexDeleteNoCompact(), which is like
 PageIndexMultiDelete but does not remove unused line pointers. However, all
 the callers pass it just a single offset. So the callers would really be
 more interested in efficiently squeezing out a single tuple from a page,
 like PageIndexTupleDelete() does, than a bulk operation.

Yeah, I noticed this and yes I agree there's no harm in changing it to a
single-tuple mode rather than bulk operation.

 How about replacing PageIndexDeleteNoCompact() with something more like
 PageIndexTupleDelete()? It ought to be both faster and simpler.

No objection.  Are you working on this, or do you intend me to?

How relevant is this given your current PageRepairFragmentation work?
I think it will cause hard merge conflicts if done right away; should it
be attempted prior to the PRF patches, or after it's done?  I assume
that if you do it, you can do both things simultaneously ...

 PS. The comment above PageIndexDeleteNoCompact says that unused items at the
 end of the array are removed. But looking at the code, I don't see it doing
 that.

Hmm, it probably lost the capability during the development :-(

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: plpgsql - Assert statement

2014-11-18 Thread Jim Nasby

On 11/18/14, 9:31 AM, Andrew Dunstan wrote:

Frankly, I find this whole proposal, and all the suggested alternatives,
somewhat ill-conceived. PLPGSQL is a wordy language. If you want something more
terse, use something else. Adding these sorts of syntactic sugar warts onto the
language doesn't seem like a terribly good way to proceed.

Such as?

The enormous advantage of plpgsql is how easy it is to run SQL. Every other PL
I've looked at makes that WAY harder. And that's assuming you're in an
environment where you can install another PL.

And honestly, I've never really found plpgsql to be terribly wordy except in a few cases
(assert being one of them). My general experience has been that when I'm
doing an IF (other than assert), I'm doing multiple things in the IF block, so it's
really not that big a deal.

As for why not do this in a separate function; yes, you can do that. But then
you've needlessly added to your context stack, it's going to be a lot slower,
and you can only really replace RAISE's functionality if you're in a version
that has format().

If someone has another brain-flash on how to make this better I'm all ears. But I don't think
arguments like use another PL or it's just syntax sugar improve things for
our users.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN and PageIndexDeleteNoCompact


On 11/18/2014 09:46 PM, Alvaro Herrera wrote:

Heikki Linnakangas wrote:

How about replacing PageIndexDeleteNoCompact() with something more like
PageIndexTupleDelete()? It ought to be both faster and simpler.


No objection.  Are you working on this, or do you intend me to?


You, please ;-).


How relevant is this given your current PageRepairFragmentation work?
I think it will cause hard merge conflicts if done right away; should it
be attempted prior to the PRF patches, or after it's done?  I assume
that if you do it, you can do both things simultaneously ...


In my PageRepairFragmentation patch, I did refactor the common part of 
PageIndexDeleteNoCompact to use the shared function. It'll look 
completely different after it's rewritten to look more like 
PageIndexTupleDelete, and won't have anything in common with 
PageRepairFragmentation anymore, and won't really be any easier for me 
to do as part of the PFR work. So don't wait for that, just go ahead and 
make the change, whenever suits you.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: plpgsql - Assert statement



On 11/18/2014 02:53 PM, Jim Nasby wrote:

On 11/18/14, 9:31 AM, Andrew Dunstan wrote:


Frankly, I find this whole proposal, and all the suggested 
alternatives, somewhat ill-conceived. PLPGSQL is a wordy language. If 
you want something more terse, use something else. Adding these sorts 
of syntactic sugar warts onto the language doesn't seem like a 
terribly good way to proceed.


Such as?

The enormous advantage of plpgsql is how easy it is to run SQL. Every 
other PL I've looked at makes that WAY harder. And that's assuming 
you're in an environment where you can install another PL.


And honestly, I've never really found plpgsql to be terribly wordy 
except in a few cases (assert being one of them). My general 
experience has been that when I'm doing an IF (other than assert), I'm 
doing multiple things in the IF block, so it's really not that big a 
deal.





I frequently write one-statement bodies of IF statements. To me that's 
not a big deal either :-)


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes superuser review

2014-11-18 Thread Adam Brightwell

All,


 2. Catalog Representation:

 Condense all attributes in pg_authid to single int64 column and create
 bitmasks accordingly.


I have been working on these changes and I was hoping for some
assistance/recommendations.

Currently, I am using int32 simply because int64 is causing some issues.
The issue is that genbki.pl is not able to associate it with the int8 type
as defined in pg_type.h.  Therefore Schema_pg_authid in schemapg.h isn't
defined correctly.  I've been digging and scratching my head on this one
but I have reached a point where I think it would be better just to ask.

Any thoughts or recommendations on how I should approach this?

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-11-18 Thread Jim Nasby


On 11/17/14, 10:50 AM, Tomas Vondra wrote:

Either of these restrictions would prevent a situation where a context
has to update accounting for two parent contexts. That'd allow updating
a single place (because each context has a single parent context with
tracking requested).


Another option might be to be lazy on updating parent contexts. I'm thinking 
something like keep a boolean (or make a size of 0 magic) that indicates 
whether a context has up-to-date info from it's children or not. That would 
allow you to only update the complete size when you need it, but if your 
children haven't been touched since the last time you calculated then you're 
don't need to recalc.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_test_fsync file descriptor leak

2014-11-18 Thread Jeff Janes

The open_datasync code opens the output file as a test to make sure the
flags are accepted by the OS, and if it succeeds it then immediately opens
the file again with the same flags, overwriting and so leaking the
descriptor from the previous open.


On Windows MinGW, this prevents the final unlink from working, as the file
is still open.

Trivial fix attached.

Thanks,

Jeff


pg_test_fsync_leak.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: plpgsql - Assert statement

2014-11-18 21:27 GMT+01:00 Andrew Dunstan and...@dunslane.net:


 On 11/18/2014 02:53 PM, Jim Nasby wrote:

 On 11/18/14, 9:31 AM, Andrew Dunstan wrote:


 Frankly, I find this whole proposal, and all the suggested alternatives,
 somewhat ill-conceived. PLPGSQL is a wordy language. If you want something
 more terse, use something else. Adding these sorts of syntactic sugar warts
 onto the language doesn't seem like a terribly good way to proceed.


 Such as?

 The enormous advantage of plpgsql is how easy it is to run SQL. Every
 other PL I've looked at makes that WAY harder. And that's assuming you're
 in an environment where you can install another PL.

 And honestly, I've never really found plpgsql to be terribly wordy except
 in a few cases (assert being one of them). My general experience has been
 that when I'm doing an IF (other than assert), I'm doing multiple things in
 the IF block, so it's really not that big a deal.



 I frequently write one-statement bodies of IF statements. To me that's not
 a big deal either :-)


anybody did it, but it doesn't need so it is perfect :) I understand well
to Jim' feeling.

I am looking to Ada 2005 language ... a design of RAISE WITH shows so RAISE
statement is extensible in Ada too. Sure - we can live without it, but I
don't think so we do some wrong with introduction RAISE WHEN and I am sure,
so a live with this feature can be more fun for someone, who intensive use
this pattern.



 cheers

 andrew

Re: [HACKERS] proposal: plpgsql - Assert statement

2014-11-18 Thread Petr Jelinek


On 18/11/14 22:11, Pavel Stehule wrote:


2014-11-18 21:27 GMT+01:00 Andrew Dunstan and...@dunslane.net
mailto:and...@dunslane.net:


I frequently write one-statement bodies of IF statements. To me
that's not a big deal either :-)


anybody did it, but it doesn't need so it is perfect :) I understand
well to Jim' feeling.

I am looking to Ada 2005 language ... a design of RAISE WITH shows so
RAISE statement is extensible in Ada too. Sure - we can live without it,
but I don't think so we do some wrong with introduction RAISE WHEN and I
am sure, so a live with this feature can be more fun for someone, who
intensive use this pattern.



Personally, I see this as natural extension of the conditional block 
control which we already have for loops with CONTINUE WHEN and EXIT 
WHEN. This basically extends it to any block and it seems quite natural 
to have it for me...


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of Refactoring code for sync node detection

On Tue, Nov 18, 2014 at 6:33 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Can we just wait on this patch until we have the whole feature?

Well, this may take some time to even define, and even if goals are clearly
defined this may take even more time to have a prototype to discuss about.


 We quite often break larger patches down into smaller ones, but if
 they arrive in lots of small pieces its more difficult to understand
 and correct issues that are happening on the larger scale. Churning
 the same piece of code multiple times is rather mind numbing.

Hm. I think that we are losing ourselves in this thread. The primary goal
is to remove a code duplication between syncrep.c and walsender,c that
exists since 9.1. Would it be possible to focus only on that for now?
That's still the topic of this CF.
-- 
Michael

Re: [HACKERS] proposal: plpgsql - Assert statement



On 11/18/2014 04:11 PM, Pavel Stehule wrote:



2014-11-18 21:27 GMT+01:00 Andrew Dunstan and...@dunslane.net 
mailto:and...@dunslane.net:



On 11/18/2014 02:53 PM, Jim Nasby wrote:

On 11/18/14, 9:31 AM, Andrew Dunstan wrote:


Frankly, I find this whole proposal, and all the suggested
alternatives, somewhat ill-conceived. PLPGSQL is a wordy
language. If you want something more terse, use something
else. Adding these sorts of syntactic sugar warts onto the
language doesn't seem like a terribly good way to proceed.


Such as?

The enormous advantage of plpgsql is how easy it is to run
SQL. Every other PL I've looked at makes that WAY harder. And
that's assuming you're in an environment where you can install
another PL.

And honestly, I've never really found plpgsql to be terribly
wordy except in a few cases (assert being one of them). My
general experience has been that when I'm doing an IF (other
than assert), I'm doing multiple things in the IF block, so
it's really not that big a deal.



I frequently write one-statement bodies of IF statements. To me
that's not a big deal either :-)


anybody did it, but it doesn't need so it is perfect :) I understand 
well to Jim' feeling.


I am looking to Ada 2005 language ... a design of RAISE WITH shows so 
RAISE statement is extensible in Ada too. Sure - we can live without 
it, but I don't think so we do some wrong with introduction RAISE WHEN 
and I am sure, so a live with this feature can be more fun for 
someone, who intensive use this pattern.






(drags out recently purchased copy of Barnes Ada 2012)

Ada's

RAISE exception_name WITH string;

is more or less the equivalent of our

RAISE level 'format_string';

So I don't think there's much analogy there.


I'm not going to die in a ditch over this, but it does seem to me very 
largely unnecessary.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: plpgsql - Assert statement

2014-11-18 22:28 GMT+01:00 Andrew Dunstan and...@dunslane.net:


 On 11/18/2014 04:11 PM, Pavel Stehule wrote:



 2014-11-18 21:27 GMT+01:00 Andrew Dunstan and...@dunslane.net mailto:
 and...@dunslane.net:



 On 11/18/2014 02:53 PM, Jim Nasby wrote:

 On 11/18/14, 9:31 AM, Andrew Dunstan wrote:


 Frankly, I find this whole proposal, and all the suggested
 alternatives, somewhat ill-conceived. PLPGSQL is a wordy
 language. If you want something more terse, use something
 else. Adding these sorts of syntactic sugar warts onto the
 language doesn't seem like a terribly good way to proceed.


 Such as?

 The enormous advantage of plpgsql is how easy it is to run
 SQL. Every other PL I've looked at makes that WAY harder. And
 that's assuming you're in an environment where you can install
 another PL.

 And honestly, I've never really found plpgsql to be terribly
 wordy except in a few cases (assert being one of them). My
 general experience has been that when I'm doing an IF (other
 than assert), I'm doing multiple things in the IF block, so
 it's really not that big a deal.



 I frequently write one-statement bodies of IF statements. To me
 that's not a big deal either :-)


 anybody did it, but it doesn't need so it is perfect :) I understand well
 to Jim' feeling.

 I am looking to Ada 2005 language ... a design of RAISE WITH shows so
 RAISE statement is extensible in Ada too. Sure - we can live without it,
 but I don't think so we do some wrong with introduction RAISE WHEN and I am
 sure, so a live with this feature can be more fun for someone, who
 intensive use this pattern.




 (drags out recently purchased copy of Barnes Ada 2012)

 Ada's

 RAISE exception_name WITH string;

 is more or less the equivalent of our

 RAISE level 'format_string';

 So I don't think there's much analogy there.


I used it as analogy of immutability of this statement in Ada,



 I'm not going to die in a ditch over this, but it does seem to me very
 largely unnecessary.

 cheers

 andrew

[HACKERS] Use of recent Russian TZ changes in regression tests

[ moving discussion from -packagers to -hackers ]

Summary for onlookers:
Bjorn Munch complained that the timestamptz regression tests added in
commit b2cbced9e failed on his Solaris machine; it emerged that he uses
--with-system-tzdata and the tzdata files on his machine predate 2014h
when the recent Russian Federation changes appeared.  The new tests
intentionally rely on the Oct 2014 Russian zone change, so the failure
is unsurprising; the question is whether we want to keep that behavior.

I wrote:
 Christoph Berg c...@df7cb.de writes:
 Re: Tom Lane 2014-11-18 1327.1416329...@sss.pgh.pa.us
 Bjorn Munch bjorn.mu...@oracle.com writes:
 I now discovered that the regression test timestamptz is failing. It
 has always failed on Solaris 10 due to a 2038 bug that has been fixed
 in Solaris 11. But now I also get failures on Solaris 11.

 Hmm ... do you build with --with-system-tzdata, and if so, is the system
 timezone database up to date?

 I ran into the same problem. A current tzdata version fixes the
 problem, but it's not ideal that this is now a dependency.

 Last time I complained about that problem, the America/Metlaka (sp?)
 reference was changed to something older that is also included in
 older tzdata versions. Possibly the regression tests could use some
 older dates and not depend on Europe/Moscow from Oct 2014.

 Perhaps.  I'm not aware offhand of other cases where a zone abbreviation
 changed meaning twice; the point of the tests is to make sure the
 abbreviation decoding code can deal with such cases, so we'd need to
 find an older similar instance before I'd be happy about changing the
 tests.

I poked around in the IANA files and found some other cases where
a zone went forward and back without any DST involved, for instance
Kosrae:

Zone Pacific/Kosrae 10:51:56 -  LMT 1901
11:00   -   KOST1969 Oct # Kosrae Time
12:00   -   KOST1999
11:00   -   KOST

That entry has been unchanged for quite some time, but so has this comment
about it:

# From Paul Eggert (1999-10-29):
# The Federated States of Micronesia Visitors Board writes in
# The Federated States of Micronesia - Visitor Information (1999-01-26)
# http://www.fsmgov.org/info/clocks.html
# that Truk and Yap are UTC+10, and Ponape and Kosrae are UTC+11.
# We don't know when Kosrae switched from UTC+12; assume January 1 for now.

So the problem with depending on this entry for testing is that if
somebody pops up and tells the IANA maintainers just when the time
switch occurred, they'll change the entry and break our test.

The one or two other cases where this happened are about as
underdocumented as Kosrae; America/Guyana is an example.

The good thing about testing with the MSK changes is that those are
quite well-documented and so we don't have to fear getting blindsided
by future updates to the IANA database.  So basically we are trading off
known short term pain (for people on machines with old TZ files) against
a risk of unexpected future pain.

My inclination is to leave the test as-is, but I'm willing to make the
changes if people would rather bet the other way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes superuser review

2014-11-18 Thread Adam Brightwell

All,


 Currently, I am using int32 simply because int64 is causing some issues.
 The issue is that genbki.pl is not able to associate it with the int8
 type as defined in pg_type.h.  Therefore Schema_pg_authid in schemapg.h
 isn't defined correctly.  I've been digging and scratching my head on this
 one but I have reached a point where I think it would be better just to ask.


Attached is a quite trivial patch that addresses the int64 (C) to int8
(SQL) mapping issue.

Further digging revealed that Catalog.pm wasn't accounting for int64
(thanks Stephen).  Would it be better to include this change as a separate
patch (as attached) or would it be preferable to include with a larger role
attribute bitmask patch?

Thanks,
Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
new file mode 100644
index eb91c53..523b379
*** a/src/backend/catalog/Catalog.pm
--- b/src/backend/catalog/Catalog.pm
*** sub Catalogs
*** 33,38 
--- 33,39 
  	my %RENAME_ATTTYPE = (
  		'int16' = 'int2',
  		'int32' = 'int4',
+ 		'int64' = 'int8',
  		'Oid'   = 'oid',
  		'NameData'  = 'name',
  		'TransactionId' = 'xid');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN pageinspect functions

2014-11-18 Thread Peter Geoghegan

On Tue, Nov 4, 2014 at 7:26 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I think these functions will be quite useful for debugging purpose
 and we already have similar function's for other index (btree).

This patch has bitrotted. I attach rebased revision, for the
convenience of others - V1.3 of pageinspect will now incorporate both
GIN stuff, and BRIN stuff. Seems like this patch was affected by the
recent problems with header includes - that's fixed.

Do you intend to fix this up?

+ /* TODO: array of decoded item pointers */
+ nulls[2] = true;

-- 
Peter Geoghegan
From 5d10bfe4f6db5e37dcf087d62f42cbc6c9423c26 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan p...@heroku.com
Date: Tue, 18 Nov 2014 13:38:12 -0800
Subject: [PATCH] Add pageinspect functions for inspecting GIN indexes.

---
 contrib/pageinspect/Makefile  |   2 +-
 contrib/pageinspect/ginfuncs.c| 265 ++
 contrib/pageinspect/pageinspect--1.2--1.3.sql |  37 
 contrib/pageinspect/pageinspect--1.3.sql  |  42 
 4 files changed, 345 insertions(+), 1 deletion(-)
 create mode 100644 contrib/pageinspect/ginfuncs.c

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index a59de8a..e651543 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -1,7 +1,7 @@
 # contrib/pageinspect/Makefile
 
 MODULE_big	= pageinspect
-OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o brinfuncs.o $(WIN32RES)
+OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o brinfuncs.o ginfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
 DATA = pageinspect--1.3.sql pageinspect--1.0--1.1.sql \
diff --git a/contrib/pageinspect/ginfuncs.c b/contrib/pageinspect/ginfuncs.c
new file mode 100644
index 000..0602873
--- /dev/null
+++ b/contrib/pageinspect/ginfuncs.c
@@ -0,0 +1,265 @@
+/*
+ * contrib/pageinspect/ginfuncs.c
+ */
+
+#include postgres.h
+
+#include access/gin.h
+#include access/gin_private.h
+#include access/htup_details.h
+#include catalog/namespace.h
+#include catalog/pg_type.h
+#include funcapi.h
+#include miscadmin.h
+#include utils/array.h
+#include utils/builtins.h
+#include utils/rel.h
+
+#define DatumGetItemPointer(X)	 ((ItemPointer) DatumGetPointer(X))
+#define ItemPointerGetDatum(X)	 PointerGetDatum(X)
+
+PG_FUNCTION_INFO_V1(gin_metapage);
+PG_FUNCTION_INFO_V1(gin_pageopaq);
+PG_FUNCTION_INFO_V1(gin_dataleafpage);
+
+Datum
+gin_metapage(PG_FUNCTION_ARGS)
+{
+	bytea	   *raw_page = PG_GETARG_BYTEA_P(0);
+	int			raw_page_size;
+	TupleDesc	tupdesc;
+	Page		page;
+	GinPageOpaque opaq;
+	GinMetaPageData *metadata;
+	HeapTuple	resultTuple;
+	Datum		values[10];
+	bool		nulls[10];
+
+	if (!superuser())
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg(must be superuser to use raw page functions;
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+	if (raw_page_size  BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg(input page too small (%d bytes), raw_page_size)));
+	page = VARDATA(raw_page);
+
+	opaq = (GinPageOpaque) PageGetSpecialPointer(page);
+	if (opaq-flags != GIN_META)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg(input page is not a GIN metapage),
+ errdetail(Flags %04X, expected %04X,
+		   opaq-flags, GIN_META)));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, return type must be a row type);
+
+	metadata = GinPageGetMeta(page);
+
+	memset(nulls, 0, sizeof(nulls));
+
+	values[0] = Int64GetDatum(metadata-head);
+	values[1] = Int64GetDatum(metadata-tail);
+	values[2] = Int32GetDatum(metadata-tailFreeSize);
+	values[3] = Int64GetDatum(metadata-nPendingPages);
+	values[4] = Int64GetDatum(metadata-nPendingHeapTuples);
+
+	/* statistics, updated by VACUUM */
+	values[5] = Int64GetDatum(metadata-nTotalPages);
+	values[6] = Int64GetDatum(metadata-nEntryPages);
+	values[7] = Int64GetDatum(metadata-nDataPages);
+	values[8] = Int64GetDatum(metadata-nEntries);
+
+	values[9] = Int32GetDatum(metadata-ginVersion);
+
+	/* Build and return the result tuple. */
+	resultTuple = heap_form_tuple(tupdesc, values, nulls);
+
+	return HeapTupleGetDatum(resultTuple);
+}
+
+
+Datum
+gin_pageopaq(PG_FUNCTION_ARGS)
+{
+	bytea	   *raw_page = PG_GETARG_BYTEA_P(0);
+	int			raw_page_size;
+	TupleDesc	tupdesc;
+	Page		page;
+	GinPageOpaque opaq;
+	HeapTuple	resultTuple;
+	Datum		values[3];
+	bool		nulls[10];
+	Datum		flags[16];
+	int			nflags = 0;
+	uint16		flagbits;
+
+	if (!superuser())
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg(must be superuser to use raw page functions;
+
+	raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+	if (raw_page_size  BLCKSZ)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg(input page too small (%d bytes), raw_page_size)));
+	page = VARDATA(raw_page);
+
+	opaq = (GinPageOpaque)

Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf

2014-11-18 Thread Petr Jelinek


On 18/11/14 12:57, Simon Riggs wrote:

On 31 October 2014 15:18, Petr Jelinek p...@2ndquadrant.com wrote:


Attached is the v2 of the patch with the review comments addressed (see
below).

...

Done, there is now action_at_recovery_target which can be set to either
pause, continue or shutdown, defaulting to pause (which is same as old
behavior of pause_at_recovery_target defaulting to true).


One comment only: I think the actions should be called: pause, promote
and shutdown, since continue leads immediately to promotion of the
server.

I'm good with this patch otherwise. Barring objections I will commit tomorrow.



OK, promote works for me as well, I attached patch that changes continue 
to promote so you don't have to find and replace everything yourself. 
The changed doc wording probably needs to be checked.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 0f1ff34..fe42394 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -289,12 +289,39 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
   /term
   listitem
para
-Specifies whether recovery should pause when the recovery target
-is reached. The default is true.
-This is intended to allow queries to be executed against the
-database to check if this recovery target is the most desirable
-point for recovery. The paused state can be resumed by using
-functionpg_xlog_replay_resume()/ (See
+Alias for action_at_recovery_target, literaltrue/ is same as
+action_at_recovery_target = literalpause/ and literalfalse/
+is same as action_at_recovery_target = literalpromote/.
+   /para
+   para
+This setting has no effect if xref linkend=guc-hot-standby is not
+enabled, or if no recovery target is set.
+   /para
+  /listitem
+ /varlistentry
+
+ /variablelist
+
+ varlistentry id=action-at-recovery-target
+   xreflabel=action_at_recovery_target
+  termvarnameaction_at_recovery_target/varname (typeenum/type)
+  indexterm
+primaryvarnameaction_at_recovery_target/ recovery parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+Specifies what action should PostgreSQL do once the recovery target is
+reached. The default is literalpause/, which means recovery will
+be paused. literalpromote/ means recovery process will finish and
+the server will start to accept connections.
+Finally literalshutdown/ will stop the PostgreSQL instance at
+recovery target.
+   /para
+The intended use of literalpause/ setting is to allow queries to be
+executed against the database to check if this recovery target is the
+most desirable point for recovery. The paused state can be resumed by
+using functionpg_xlog_replay_resume()/ (See
 xref linkend=functions-recovery-control-table), which then
 causes recovery to end. If this recovery target is not the
 desired stopping point, then shutdown the server, change the
@@ -302,6 +329,20 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
 continue recovery.
/para
para
+The literalshutdown/ setting is useful to have instance ready at
+exact replay point desired.
+The instance will still be able to replay more WAL records (and in fact
+will have to replay WAL records since last checkpoint upon next time it is
+started).
+   /para
+   para
+Note that because filenamerecovery.conf/ will not be renamed when
+varnameaction_at_recovery_target/ is set to literalshutdown/,
+any subsequent start will end with immediate shutdown unless the
+configuration is changed or the filenamerecovery.conf/ is removed
+manually.
+   /para
+   para
 This setting has no effect if xref linkend=guc-hot-standby is not
 enabled, or if no recovery target is set.
/para
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3c9aeae..974e6c1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -227,7 +227,7 @@ static char *recoveryEndCommand = NULL;
 static char *archiveCleanupCommand = NULL;
 static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
 static bool recoveryTargetInclusive = true;
-static bool recoveryPauseAtTarget = true;
+static RecoveryTargetAction actionAtRecoveryTarget = RECOVERY_TARGET_ACTION_PAUSE;
 static TransactionId recoveryTargetXid;
 static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
@@ -5068,6 +5068,9 @@ readRecoveryCommandFile(void)
 	ConfigVariable *item,
 			   *head = NULL,
 			   *tail

Re: [HACKERS] New Event Trigger: table_rewrite

2014-11-18 Thread Dimitri Fontaine

Hi,

Michael Paquier michael.paqu...@gmail.com writes:
 1) This patch is authorizing VACUUM and CLUSTER to use the event
 triggers ddl_command_start and ddl_command_end, but aren't those
 commands actually not DDLs but control commands?

Reverted in the attached version 3 of the patch.

 6) in_table_rewrite seems unnecessary.

Removed in the attached version 3 of the patch.

On Sun, Nov 16, 2014 at 5:51 AM, Simon Riggs si...@2ndquadrant.com wrote:
 4) pg_event_trigger_table_rewrite_oid is able to return only one OID,
 which is the one of the table being rewritten, and it is limited to
 one OID because VACUUM, CLUSTER and ALTER TABLE can only run on one
 object at the same time in a single transaction. What about thinking
 that we may have in the future multiple objects rewritten in a single
 transaction, hence multiple OIDs could be fetched?

 Why would this API support something which the normal trigger API
 doesn't, just in case we support a feature that hadn't ever been
 proposed or discussed? Why can't such a change wait until that feature
 arrives?

Agreed, unchanged in the attached.

Robert Haas robertmh...@gmail.com writes:
 It seems pretty weird, also, that the event trigger will fire after
 we've taken AccessExclusiveLock when you cluster a particular
 relation, and before we've taken AccessExclusiveLock when you cluster
 database-wide.  That's more or less an implementation artifact of the
 current code that we're exposing to the use for, really, no good
 reason.

In the CLUSTER implementation we have only one call site for invoking
the Event Trigger, in cluster_rel(). While it's true that in the single
relation case, the relation is opened in cluster() then cluster_rel() is
called, the opening is done with NoLock in cluster():

rel = heap_open(tableOid, NoLock);

My understanding is that the relation locking only happens in
cluster_rel() at this line:

OldHeap = try_relation_open(tableOid, AccessExclusiveLock);

Please help me through the cluster locking strategy here, I feel like
I'm missing something obvious, as my conclusion from re-reading the code
in lights of your comment is that your comment is not accurate with
respect to the current state of the code.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml
index 6f71a27..704a377 100644
--- a/doc/src/sgml/event-trigger.sgml
+++ b/doc/src/sgml/event-trigger.sgml
@@ -65,6 +65,12 @@
/para
 
para
+The literaltable_rewrite/ event occurs just before a table is going to
+get rewritten by the commands literalALTER TABLE/literal,
+literalCLUSTER/literal or literalVACUUM/literal.
+   /para
+
+   para
  Event triggers (like other functions) cannot be executed in an aborted
  transaction.  Thus, if a DDL command fails with an error, any associated
  literalddl_command_end/ triggers will not be executed.  Conversely,
@@ -120,518 +126,625 @@
 entryliteralddl_command_start/literal/entry
 entryliteralddl_command_end/literal/entry
 entryliteralsql_drop/literal/entry
+entryliteraltable_rewrite/literal/entry
/row
   /thead
   tbody
row
+entry align=leftliteralANALYZE/literal/entry
+entry align=centerliteralX/literal/entry
+entry align=centerliteralX/literal/entry
+entry align=centerliteral-/literal/entry
+entry align=centerliteral-/literal/entry
+   /row
+   row
 entry align=leftliteralALTER AGGREGATE/literal/entry
 entry align=centerliteralX/literal/entry
 entry align=centerliteralX/literal/entry
 entry align=centerliteral-/literal/entry
+entry align=centerliteral-/literal/entry
/row
row
 entry align=leftliteralALTER COLLATION/literal/entry
 entry align=centerliteralX/literal/entry
 entry align=centerliteralX/literal/entry
 entry align=centerliteral-/literal/entry
+entry align=centerliteral-/literal/entry
/row
row
 entry align=leftliteralALTER CONVERSION/literal/entry
 entry align=centerliteralX/literal/entry
 entry align=centerliteralX/literal/entry
 entry align=centerliteral-/literal/entry
+entry align=centerliteral-/literal/entry
/row
row
 entry align=leftliteralALTER DOMAIN/literal/entry
 entry align=centerliteralX/literal/entry
 entry align=centerliteralX/literal/entry
 entry align=centerliteral-/literal/entry
+entry align=centerliteral-/literal/entry
/row
row
 entry align=leftliteralALTER EXTENSION/literal/entry
 entry align=centerliteralX/literal/entry
 entry align=centerliteralX/literal/entry
 entry align=centerliteral-/literal/entry
+entry align=centerliteral-/literal/entry
/row
row

Re: [HACKERS] Additional role attributes superuser review

Adam Brightwell adam.brightw...@crunchydatasolutions.com writes:
 Currently, I am using int32 simply because int64 is causing some issues.
 The issue is that genbki.pl is not able to associate it with the int8 type
 as defined in pg_type.h.  Therefore Schema_pg_authid in schemapg.h isn't
 defined correctly.  I've been digging and scratching my head on this one
 but I have reached a point where I think it would be better just to ask.

 Any thoughts or recommendations on how I should approach this?

Teach src/backend/catalog/Catalog.pm about it.

There once was a policy against using int64 in the system catalogs, which
is why it wasn't there already; but the reason for that policy is long
gone.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use of recent Russian TZ changes in regression tests

2014-11-18 Thread David G Johnston

Tom Lane-2 wrote
The good thing about testing with the MSK changes is that those are
quite well-documented and so we don't have to fear getting blindsided
by future updates to the IANA database. So basically we are trading off
known short term pain (for people on machines with old TZ files) against
a risk of unexpected future pain.

My inclination is to leave the test as-is, but I'm willing to make the
changes if people would rather bet the other way.

The tests are intended to exercise behavior against a known set of input -
in this case known TZ files. Ideally I would simply prefer to skip over (or
give people a configure option) running these tests if --with-system-tzdata
has been specified.

The reason for the configure option is that packagers enabling this option
are basically giving up some level of control or caring whether the TZ files
play well with the files deployed on their users' machines. However,
someone compiling and then deploying from source will like to be able to see
whether their current environment pass all of the tests even if they are
going to be using system timezone files instead of PostgreSQL supplied ones.

I am tending to like the argument for saying date/time handling is integral
to a database and thus tests should be allowed to exercise any data
contained in the TZ files provided by PostgreSQL itself. I would even
consider simply testing if the system timezone file is older than the
PostgreSQL file and failing a test if that is the case.

If that seems too harsh then testing against the earliest time we should
have seen and accommodated the behavior seems like the most even-handed
approach. Basically pretend we caught these nuances at the moment they came
to be and wrote our test just they way we would have to write it back then.
If it changes in the future we would have had to adapt anyway so no harm
done.

In the end having compilation or testing fail if older TZ files are present
on the system AND --with-system-tzdata having been configured seems like a
reasonable policy but can be separated from the issue here which is that we
are fixing code to handle older data but choosing to use newer data to test
said code. So I'd say we time-travel to fix the back branches and decide
whether to change our policy going forward and also optionally add a
configure option to control the level of validation when PostgreSQL is asked
to use system timezone information instead of its own.

David J.

--
View this message in context:
http://postgresql.nabble.com/Use-of-recent-Russian-TZ-changes-in-regression-tests-tp5827430p5827436.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes superuser review



On 11/18/2014 04:58 PM, Adam Brightwell wrote:

All,

Currently, I am using int32 simply because int64 is causing some
issues.  The issue is that genbki.pl http://genbki.pl is not
able to associate it with the int8 type as defined in pg_type.h. 
Therefore Schema_pg_authid in schemapg.h isn't defined correctly. 
I've been digging and scratching my head on this one but I have

reached a point where I think it would be better just to ask.


Attached is a quite trivial patch that addresses the int64 (C) to int8 
(SQL) mapping issue.


Further digging revealed that Catalog.pm wasn't accounting for int64 
(thanks Stephen). Would it be better to include this change as a 
separate patch (as attached) or would it be preferable to include with 
a larger role attribute bitmask patch?





I think we should just apply this now. As Tom said the reason for not 
doing it is long gone.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New Event Trigger: table_rewrite

2014-11-18 Thread Alvaro Herrera

Dimitri Fontaine wrote:

 In the CLUSTER implementation we have only one call site for invoking
 the Event Trigger, in cluster_rel(). While it's true that in the single
 relation case, the relation is opened in cluster() then cluster_rel() is
 called, the opening is done with NoLock in cluster():
 
   rel = heap_open(tableOid, NoLock);
 
 My understanding is that the relation locking only happens in
 cluster_rel() at this line:
 
   OldHeap = try_relation_open(tableOid, AccessExclusiveLock);
 
 Please help me through the cluster locking strategy here, I feel like
 I'm missing something obvious, as my conclusion from re-reading the code
 in lights of your comment is that your comment is not accurate with
 respect to the current state of the code.

Almost the whole of that function is conditions to bail out clustering
the relation if things have changed since the relation list was
collected.  It seems wrong to invoke the event trigger in all those
cases; it's going to fire spuriously.  I think you should move the
invocation of the event trigger at the end, just before rebuild_relation
is called.  Not sure where relative to the predicate lock stuff therein;
probably before, so that we avoid doing that dance if the event trigger
function decides to jump ship.

In ATRewriteTables, it seems wrong to call it after make_new_heap.  If
the event trigger function aborts, we end up with useless work done
there; so I think it should be called before that.  Also, why do you
have the evt_table_rewrite_fired stuff?  I think you should fire one
event per table, no?

The second ATRewriteTable call in ATRewriteTables does not actually
rewrite the table; it only scans it to verify constraints.  So I'm
thinking you shouldn't call this event trigger there.  Or, if we decide
we want this, we probably also need something for the table scans in
ALTER DOMAIN too.

You still have the ANALYZE thing in docs, which now should be removed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use of recent Russian TZ changes in regression tests

David G Johnston david.g.johns...@gmail.com writes:
 Tom Lane-2 wrote
 The good thing about testing with the MSK changes is that those are
 quite well-documented and so we don't have to fear getting blindsided
 by future updates to the IANA database.  So basically we are trading off
 known short term pain (for people on machines with old TZ files) against
 a risk of unexpected future pain.
 
 My inclination is to leave the test as-is, but I'm willing to make the
 changes if people would rather bet the other way.

 I am tending to like the argument for saying date/time handling is integral
 to a database and thus tests should be allowed to exercise any data
 contained in the TZ files provided by PostgreSQL itself.  I would even
 consider simply testing if the system timezone file is older than the
 PostgreSQL file and failing a test if that is the case.

That doesn't make any sense as it stands, but I take your point, and
indeed that's more or less the reasoning that led me to write the test
as it is: we *should* complain if you've got TZ data that's more than
6 months old.  However, the contrary opinion is that whether the user's
TZ data is too old for his purposes is not ours to decide; and that's
surely a tenable position as well.

I'm not particularly excited about allowing the regression tests to
pass if the test cases fail; that would obscure actual failures in
this area.  I'd sooner just remove the problematic cases altogether.

Another thought that just occurred to me is that we need to test
both advance and retreat of a zone's notion of standard time, but
that doesn't mean that both cases have to be tested in the same
zone.  The 2011 Russian advance is probably reasonable to depend
on by now, but maybe we could find some other well-documented case
where a zone's standard time offset decreased relative to UTC.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use of recent Russian TZ changes in regression tests

I wrote:
 Another thought that just occurred to me is that we need to test
 both advance and retreat of a zone's notion of standard time, but
 that doesn't mean that both cases have to be tested in the same
 zone.  The 2011 Russian advance is probably reasonable to depend
 on by now, but maybe we could find some other well-documented case
 where a zone's standard time offset decreased relative to UTC.

Ah, here we go:

# Venezuela
#
# From John Stainforth (2007-11-28):
# ... the change for Venezuela originally expected for 2007-12-31 has
# been brought forward to 2007-12-09.  The official announcement was
# published today in the Gaceta Oficial de la RepÃºblica Bolivariana
# de Venezuela, nÃºmero 38.819 (official document for all laws or
# resolution publication)
# http://www.globovision.com/news.php?nid=72208

# Zone  NAMEGMTOFF  RULES   FORMAT  [UNTIL]
ZoneAmerica/Caracas -4:27:44 -  LMT 1890
-4:27:40 -  CMT 1912 Feb 12 # Caracas Mean Time?
-4:30   -   VET 1965# Venezuela Time
-4:00   -   VET 2007 Dec  9  3:00
-4:30   -   VET

That 2007 change has the right sign (becoming more negative relative
to UTC), and it seems pretty solidly documented so it's unlikely to
change on us in future.  Being in the other direction from Greenwich
shouldn't be an issue, maybe it's even better for coverage purposes.

Hence, proposal: leave the MSK 2011 cases as-is but replace the 2014
cases with comparable testing around the VET 2007 change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN pageinspect functions

2014-11-18 Thread Peter Geoghegan

On Tue, Nov 18, 2014 at 2:01 PM, Peter Geoghegan p...@heroku.com wrote:
 Do you intend to fix this up?

BTW, how do you feel about the B-Tree check extension [1]? It's very
much related to pageinspect -- it's more or less a derivative. I don't
think I'm going to have time (or that there is sufficient review
bandwidth available) to get it into 9.5, but I should post a revision
soon, so it's at least something that's available for use by an
expert. I did some clean-up work on it that is unpublished. It'll
become a more generic extension - amcheck, per Robert's suggestion.

One unpublished additional feature (that I have to fix a bug in) that
isn't included in [1] is the idea of checking invariants across B-Tree
pages. So, a scankey should indicate that the greatest (non-highkey)
item on a non-rightmost page comports with the page that it has a
right link to. Without race conditions.

I don't have that swapped into my head at the moment, and so I don't
have a good sense of how hard it'll be to fix the problem I found...

[1] 
http://www.postgresql.org/message-id/cam3swzrtv+xmrwlwq6c-x7czvwavfdwfi4st1zz4ddgfh4y...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-11-18 Thread Robert Haas

On Mon, Nov 17, 2014 at 3:04 PM, Peter Geoghegan p...@heroku.com wrote:
 postgres=# select qty from orderlines ;
 ERROR:  42703: column qty does not exist
 LINE 1: select qty from orderlines ;
^
 HINT:  Perhaps you meant to reference the column orderlines.quantity.
 

I don't buy this example, because it would give you the same hint if
you told it you wanted to access a column called ant, or uay, or tit.
And that's clearly ridiculous.  The reason why quantity looks like a
reasonable suggestion for qty is because it's a conventional
abbreviation, but an extremely high percentage of comparable cases
won't be.

 +/*
 + * Charge extra (for inexact matches only) when an alias was
 + * specified that differs from what might have been used to
 + * correctly qualify this RTE's closest column
 + */
 +if (wrongalias)
 +rtestate.distance += 3;

 I don't understand what situation this is catering to.  Can you
 explain?  It seems to account for a good deal of complexity.

 Two cases:

 1. Distinguishing between the case where there was an exact match to a
 column that isn't visible (i.e. the existing reason for
 errorMissingColumn() to call here), and the case where there is a
 visible column, but our alias was the wrong one. I guess that could
 live in errorMissingColumn(), but overall it's more convenient to do
 it here, so that errorMissingColumn() handles things almost uniformly
 and doesn't really have to care.

 2. For non-exact (fuzzy) matches, it seems more useful to give one
 match rather than two when the user gave an alias that matches one
 particular RTE. Consider this:

 
 postgres=# select ordersid from orders o join orderlines ol on
 o.orderid = ol.orderid;
 ERROR:  42703: column ordersid does not exist
 LINE 1: select ordersid from orders o join orderlines ol on o.orderi...
^
 HINT:  Perhaps you meant to reference the column o.orderid or the
 column ol.orderid.
 LOCATION:  errorMissingColumn, parse_relation.c:3166

 postgres=# select ol.ordersid from orders o join orderlines ol on
 o.orderid = ol.orderid;
 ERROR:  42703: column ol.ordersid does not exist
 LINE 1: select ol.ordersid from orders o join orderlines ol on o.ord...
^
 HINT:  Perhaps you meant to reference the column ol.orderid.
 LOCATION:  errorMissingColumn, parse_relation.c:3147
 

I guess I'm confused at a broader level.  If the alias is wrong, why
are we considering names in this RTE *at all*?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-11-18 Thread Peter Geoghegan

On Tue, Nov 18, 2014 at 3:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Nov 17, 2014 at 3:04 PM, Peter Geoghegan p...@heroku.com wrote:
 postgres=# select qty from orderlines ;
 ERROR:  42703: column qty does not exist
 LINE 1: select qty from orderlines ;
^
 HINT:  Perhaps you meant to reference the column orderlines.quantity.
 

 I don't buy this example, because it would give you the same hint if
 you told it you wanted to access a column called ant, or uay, or tit.
 And that's clearly ridiculous.  The reason why quantity looks like a
 reasonable suggestion for qty is because it's a conventional
 abbreviation, but an extremely high percentage of comparable cases
 won't be.

Is that so terrible? Yes, if those *exact* strings are tried, that'll
happen. But the vast majority of 3 letter strings will not do that
(including many 3 letter strings that include one of the letters 'q',
't' and 'y', such as qqq, ttt, and yyy). Why, in practice, would
someone even attempt those strings? I'm worried about Murphy, not
Machiavelli. That seems like a pretty important distinction here.

I maintain that omission of part of the correct spelling should be
weighed less. I am optimizing for the case where the user has a rough
idea of the structure and spelling of things - if they're typing in
random strings, or totally distinct synonyms, there is little we can
do about that. As I said, there will always be the most marginal case
that still gets a suggestion. I see no reason to hurt the common case
where we help in order to save the user from seeing a ridiculous
suggestion. I have a final test for the absolute quality of a
suggestion, but I think we could easily be too conservative about
that. At worst, our ridiculous suggestion makes apparent that the
user's incorrect spelling was itself ridiculous. With larger strings,
we can afford to be more conservative, and we are, because we have
more information to go on. Terse column names are not uncommon,
though.

 +/*
 + * Charge extra (for inexact matches only) when an alias was
 + * specified that differs from what might have been used to
 + * correctly qualify this RTE's closest column
 + */
 +if (wrongalias)
 +rtestate.distance += 3;

 I don't understand what situation this is catering to.  Can you
 explain?  It seems to account for a good deal of complexity.

 I guess I'm confused at a broader level.  If the alias is wrong, why
 are we considering names in this RTE *at all*?

Because it's a common mistake when writing ad-hoc queries. People may
forget which exact table their column comes from. We certainly want to
weigh the fact that an alias was specified, but it shouldn't totally
limit our guess to that RTE. If nothing else, the fact that there was
a much closer match from another RTE ought to result in forcing there
to be no suggestion (due to there being too many equally good
suggestions). That's because, as I said, an *absolute* test for the
quality of a match is problematic (which, again, is why I err on the
side of letting the final, absolute quality test not limit
suggestions, particularly with short strings).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

Peter Geoghegan p...@heroku.com writes:
 On Tue, Nov 18, 2014 at 3:29 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Nov 17, 2014 at 3:04 PM, Peter Geoghegan p...@heroku.com wrote:
 postgres=# select qty from orderlines ;
 ERROR:  42703: column qty does not exist
 HINT:  Perhaps you meant to reference the column orderlines.quantity.

 I don't buy this example, because it would give you the same hint if
 you told it you wanted to access a column called ant, or uay, or tit.
 And that's clearly ridiculous.  The reason why quantity looks like a
 reasonable suggestion for qty is because it's a conventional
 abbreviation, but an extremely high percentage of comparable cases
 won't be.

 I maintain that omission of part of the correct spelling should be
 weighed less.

I would say that omission of the first letter should completely disqualify
suggestions based on this heuristic; but it might make sense to weight
omissions less after the first letter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

On Tue, Nov 18, 2014 at 2:36 AM, Fujii Masao masao.fu...@gmail.com wrote:

 On Thu, Nov 13, 2014 at 12:38 PM, Fujii Masao masao.fu...@gmail.com
 wrote:
  Thanks for reviewing the patch!
 
  On Thu, Nov 13, 2014 at 4:05 AM, Alvaro Herrera
  alvhe...@2ndquadrant.com wrote:
  Fujii Masao wrote:
 
  --- 127,152 
 When this option is used, applicationpg_receivexlog/
 will report
 a flush position to the server, indicating when each
 segment has been
 synchronized to disk so that the server can remove that
 segment if it
  !  is not otherwise needed. literal--synchronous/literal
 option must
  ! be specified when making applicationpg_receivexlog/ run
 as
  ! synchronous standby by using replication slot. Otherwise WAL
 data
  ! cannot be flushed frequently enough for this to work
 correctly.
/para
  /listitem
 /varlistentry
 
  Whitespace damage here.
 
  Fixed.
 
  + printf(_(  --synchronous  flush transaction log in real
 time\n));
 
  in real time sounds odd.  How about flush transaction log
  immediately after writing, or maybe have transaction log writes be
  synchronous.
 
  The former sounds better to me. So I chose it.
 
  --- 781,791 
now = feGetCurrentTimestamp();
 
/*
  !  * Issue sync command as soon as there are WAL data which
  !  * has not been flushed yet if synchronous option is
 true.
 */
if (lastFlushPosition  blockpos 
  ! walfile != -1  synchronous)
 
  I'd put the synchronous condition first in the if(), and start the
  comment with it rather than putting it at the end.  Both seem weird.
 
  Fixed, i.e., moved the synchronous condition first in the if()'s test
  and also moved the comment If synchronous option is true also first
  in the comment.
 
progname,
 current_walfile_name, strerror(errno));
goto error;
}
lastFlushPosition = blockpos;
  !
  ! /*
  !  * Send feedback so that the server sees the
 latest WAL locations
  !  * immediately if synchronous option is true.
  !  */
  ! if (!sendFeedback(conn, blockpos, now, false))
  ! goto error;
  ! last_status = now;
 
  I'm not clear about this comment .. why does it say if synchronous
  option is true when it's not checking the condition?
 
  I added that comment because the code exists with the if() block
  checking synchronous condition. But it seems confusing. Just removed
  that part from the comment.
 
  Attached is the updated version of the patch.

 I've just pushed this.

Marked this patch as committed on the commit fest app.
-- 
Michael

Re: [HACKERS] tracking commit timestamps

2014-11-18 Thread Petr Jelinek


On 15/11/14 13:36, Simon Riggs wrote:

On 15 November 2014 04:32, Steve Singer st...@ssinger.info wrote:


The use cases I'm talking about aren't really replication related. Often I
have come across systems that want to do something such as 'select * from
orders where X  the_last_row_I_saw order by X' and then do further
processing on the order.


Yes, existing facilities provide mechanisms for different types of
application change queues.

If you want to write a processing queue in SQL, that isn't the best
way. You'll need some way to keep track of whether or not its been
successfully processed. That's either a column in the table, or a
column in a queue table maintained by triggers, with the row write
locked on read. You can then have multiple readers from this queue
using the new SKIP LOCKED feature, which was specifically designed to
facilitate that.

Logical decoding was intended for much more than just replication. It
provides commit order access to changed data in a form that is both
usable and efficient for high volume applicatiion needs.

I don't see any reason to add LSN into a SLRU updated at commit to
support those application needs.



I am still on the fence about the LSN issue, I don't mind it from code 
perspective, it's already written anyway, but I am not sure if we really 
want it in the SLRU as Simon says.


Mainly because of three things:
One, this patch is not really feature patch, as you can do most of what 
it does via tables already, but more a performance improvement and we 
should try to make it perform as good as possible then, adding more 
things does not really improve performance (according to my benchmarks 
the performance difference with/without LSN is under 1% so it's not 
terrible, but it's there), not to mention additional disk space.


Two, the LSN use-cases seem to still be only theoretical to me, while 
the timestamp use-case has been production problem for at least a decade.


Three, even if we add LSN, I am still not convinced that the use-cases 
presented here wouldn't be better served by putting that info into 
actual table instead of SLRU - as people want to use it as filter in 
WHERE clause, somebody mentioned exporting to different db, etc.


Maybe we need better explanation of the LSN use-case(s) to understand 
why it should be stored here and why the other solutions are 
significantly worse.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-11-18 Thread Amit Kapila

On Tue, Nov 18, 2014 at 7:49 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Amit Kapila wrote:
  On Sun, Nov 16, 2014 at 6:15 AM, Alvaro Herrera 
alvhe...@2ndquadrant.com
  wrote:


 One use case mentioned upthread is having the clone be created in
the
 same machine as the source server.  Does your proposal help with
it?

  That use case got addressed with -T option with which user can relocate
  tablespace directory (Commit: fb05f3c;  pg_basebackup: Add support for
  relocating tablespaces)

 Okay.  As far as I know, -T only works for plain mode, right?

Yes.

 I wonder
 if we should make -T modify the tablespace catalog, so that the
 resulting file in tar output outputs names mangled per the map; that
 would make -T work in tar mode too.  Does that make sense?

tablepspace catalog (I assume it is new file you are talking about) is
formed on the server where as handling for -T is completely in
pg_basebackup, we might be able to make it work, but I am not
sure if it is worth because the main usecase for -T option is for plain
format.  I think even if there is some use case for -T to work with tar
format, it is a separate project.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] PostgreSQL doesn't stop propley when --slot option is specified with pg_receivexlog.

2014-11-18 Thread Fujii Masao

On Mon, Nov 17, 2014 at 12:30 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Nov 17, 2014 at 10:02 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Nov 15, 2014 at 9:10 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 Yep, sounds a good thing to do if master requested answer from the
 client in the keepalive message. Something like the patch attached
 would make the deal.

 Isn't it better to do this only when replication slot is used?
 Makes sense. What about a check using reportFlushPosition then?

Sounds reasonable. Thanks for updating the patch!
But the patch could not already be applied to the master cleanly
because c4f99d2 heavily changed the code that the patch also touches...
I rewrote the patch and pushed it to both master and REL9_4_STABLE.
Anyway, thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-11-18 Thread Fujii Masao

On Wed, Nov 19, 2014 at 10:20 AM, Michael Paquier
michael.paqu...@gmail.com wrote:


 On Tue, Nov 18, 2014 at 2:36 AM, Fujii Masao masao.fu...@gmail.com wrote:

 On Thu, Nov 13, 2014 at 12:38 PM, Fujii Masao masao.fu...@gmail.com
 wrote:
  Thanks for reviewing the patch!
 
  On Thu, Nov 13, 2014 at 4:05 AM, Alvaro Herrera
  alvhe...@2ndquadrant.com wrote:
  Fujii Masao wrote:
 
  --- 127,152 
 When this option is used, applicationpg_receivexlog/
  will report
 a flush position to the server, indicating when each
  segment has been
 synchronized to disk so that the server can remove that
  segment if it
  !  is not otherwise needed. literal--synchronous/literal
  option must
  ! be specified when making applicationpg_receivexlog/ run
  as
  ! synchronous standby by using replication slot. Otherwise WAL
  data
  ! cannot be flushed frequently enough for this to work
  correctly.
/para
  /listitem
 /varlistentry
 
  Whitespace damage here.
 
  Fixed.
 
  + printf(_(  --synchronous  flush transaction log in real
  time\n));
 
  in real time sounds odd.  How about flush transaction log
  immediately after writing, or maybe have transaction log writes be
  synchronous.
 
  The former sounds better to me. So I chose it.
 
  --- 781,791 
now = feGetCurrentTimestamp();
 
/*
  !  * Issue sync command as soon as there are WAL data
  which
  !  * has not been flushed yet if synchronous option is
  true.
 */
if (lastFlushPosition  blockpos 
  ! walfile != -1  synchronous)
 
  I'd put the synchronous condition first in the if(), and start the
  comment with it rather than putting it at the end.  Both seem weird.
 
  Fixed, i.e., moved the synchronous condition first in the if()'s test
  and also moved the comment If synchronous option is true also first
  in the comment.
 
progname,
  current_walfile_name, strerror(errno));
goto error;
}
lastFlushPosition = blockpos;
  !
  ! /*
  !  * Send feedback so that the server sees the
  latest WAL locations
  !  * immediately if synchronous option is true.
  !  */
  ! if (!sendFeedback(conn, blockpos, now, false))
  ! goto error;
  ! last_status = now;
 
  I'm not clear about this comment .. why does it say if synchronous
  option is true when it's not checking the condition?
 
  I added that comment because the code exists with the if() block
  checking synchronous condition. But it seems confusing. Just removed
  that part from the comment.
 
  Attached is the updated version of the patch.

 I've just pushed this.

 Marked this patch as committed on the commit fest app.

Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] psql \watch always ignores \pset null

2014-11-18 Thread Fujii Masao

Hi,

/*
 * Set up rendering options, in particular, disable the pager, because
 * nobody wants to be prompted while watching the output of 'watch'.
 */
myopt.nullPrint = NULL;
myopt.topt.pager = 0;

I found psql's \watch command always ignores \pset null setting.
The above source code comment explains why it does,
but I'd like to see the specified string for null value even in \watch's
output, in order to distinguish null and an empty value. Thought?
Is there any reason why \watch must ignore \pset null setting?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Multiple call of GetTransactionSnapshot in single flow

2014-11-18 Thread Rajeev rastogi

I have observed  in some places like exec_bind_message  and exec_simple_query,
Though these two function have already got snapshot but before they call 
function PortalStart,
current snapshot gets popped off and then they pass InvalidSnapshot  as 
parameter
because of which inside PortalStart again snapshot is taken.

In cases where many transactions are running, taking snapshot multiple times 
may be very costly.

What is the reason for taking snapshot multiple time:

1.   Is this implementation to make sure snapshot is at more granular level 
?

2.   Is it something do with current command id of the snapshot?

3.   Or there is any other specific reason for this, which I am not able 
visualize?

4.   Or am I missing something else?

If it is just reason 1, then maybe we can try to pass the same snapshot to 
PortalStart as taken in caller, it can enhance the performance in many case.
With this change, I  did one small performance test on pgbench with prepared 
queries for pgbench select with 16 users and observed performance benefit of 
10%.

Please provide your opinion?

Thanks and Regards,
Kumar Rajeev Rastogi

Re: [HACKERS] inherit support for foreign tables


(2014/11/18 18:09), Ashutosh Bapat wrote:

I looked at the patch. It looks good now. Once we have the inh patch
ready, we can mark this item as ready for commiter.


Thanks for the review!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw behaves oddly


(2014/11/18 18:27), Ashutosh Bapat wrote:

On Tue, Nov 18, 2014 at 1:50 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:
(2014/11/17 19:36), Ashutosh Bapat wrote:



Here are my comments about the patch fscan_reltargetlist.patch



2. Instead of using rel-reltargetlist, we should use the tlist
passed
by caller. This is the tlist expected from the Plan node. For
foreign
scans it will be same as rel-reltargetlist. Using tlist would
make the
function consistent with create_*scan_plan functions.



I disagree with that for the reasons mentioned below:

* For a foreign scan, tlist is *not* necessarily the same as
rel-reltargetlist (ie, there is a case where tlist contains all
user attributes while rel-reltargetlist contains only attributes
actually needed by the query).  In such a case it'd be inefficient
to use tlist rather than rel-reltargetlist.



create_foreignscan_plan() is called from create_scan_plan(), which
passes the tlist. The later uses function use_physical_tlist() to decide
which targetlist should be used (physical or actual). As per code below
in this function

  485 /*
  486  * We can do this for real relation scans, subquery scans,
function scans,
  487  * values scans, and CTE scans (but not for, eg, joins).
  488  */
  489 if (rel-rtekind != RTE_RELATION 
  490 rel-rtekind != RTE_SUBQUERY 
  491 rel-rtekind != RTE_FUNCTION 
  492 rel-rtekind != RTE_VALUES 
  493 rel-rtekind != RTE_CTE)
  494 return false;
  495
  496 /*
  497  * Can't do it with inheritance cases either (mainly because
Append
  498  * doesn't project).
  499  */
  500 if (rel-reloptkind != RELOPT_BASEREL)
  501 return false;

For foreign tables as well as the tables under inheritance hierarchy it
uses the actual targetlist, which contains only the required columns IOW
rel-reltargetlist (see build_path_tlist()) with nested loop parameters
substituted. So, it never included unnecessary columns in the
targetlist.


Maybe I'm missing something, but I think you are overlooking the case 
for foreign tables that are *not* under an inheritance hierarchy.  (Note 
that the rtekind for foreign tables is RTE_RELATION.)


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multiple call of GetTransactionSnapshot in single flow

Rajeev rastogi rajeev.rast...@huawei.com writes:
 What is the reason for taking snapshot multiple time:

To get the right answer.  I believe Robert Haas has been down this rabbit
hole most recently before you: see commits d573e239f and 532994299.
The execution snapshot has to be (at minimum) later than acquisition of
all table locks for the query, and that means it'd better be different
from the snapshot used for parse/plan activities.

The right solution here is not so much to reduce the number of snapshots
as to make them cheaper.  Heikki was working on something for that,
but I'm not sure where he is with that patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

On Thu, Nov 13, 2014 at 10:25 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Andres Freund wrote:
 On 2014-11-12 18:23:38 -0500, Robert Haas wrote:

   The problem is that it's very hard to avoid the wrong index's
   relfilenode being used when swapping the relfilenodes between two
   indexes.
 
  How about storing both the old and new relfilenodes in the same pg_class 
  entry?

 That's quite a cool idea

 [think a bit]

 But I think it won't work realistically. We have a *lot* of
 infrastructure that refers to indexes using it's primary key.

 Hmm, can we make the relmapper do this job instead of having another
 pg_class column?  Essentially the same sketch Robert proposed, instead
 we would initially set relfilenode=0 and have all onlookers use the
 relmapper to obtain the correct relfilenode; switching to the new
 relfilenode can be done atomically, and un-relmap the index once the
 process is complete.
 The difference from what Robert proposes is that the transient state is
 known to cause failures for anyone not prepared to deal with it, so it
 should be easy to spot what places need adjustment.

How would the failure handling actually work? Would we need some extra
process to remove the extra relfilenodes? Note that in the current
patch the temporary concurrent entry is kept as INVALID all the time,
giving the user a path to remove them with DROP INDEX even in the case
of invalid toast indexes in catalog pg_toast.

Note that I am on the side of using the exclusive lock when swapping
relfilenodes for now in any case, that's what pg_repack does by
renaming the indexes, and people use it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \watch always ignores \pset null

Fujii Masao masao.fu...@gmail.com writes:
 Hi,
 /*
  * Set up rendering options, in particular, disable the pager, because
  * nobody wants to be prompted while watching the output of 'watch'.
  */
 myopt.nullPrint = NULL;
 myopt.topt.pager = 0;

 I found psql's \watch command always ignores \pset null setting.
 The above source code comment explains why it does,
 but I'd like to see the specified string for null value even in \watch's
 output, in order to distinguish null and an empty value. Thought?
 Is there any reason why \watch must ignore \pset null setting?

Hmmm ... the comment offers a reasonable argument for forcing pager = 0,
but I agree the nullPrint change is not adequately explained.
Will, do you remember why you did that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw behaves oddly


(2014/11/18 18:37), Ashutosh Bapat wrote:

On Tue, Nov 18, 2014 at 1:55 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:
(2014/11/17 19:54), Ashutosh Bapat wrote:



Here are comments for postgres_fdw-syscol patch.



Code
---
1. Instead of a single liner comment System columns can't be
sent to
remote., it might be better to explain why system columns can't
be sent
to the remote.



Done.



I would add  and foreign values do not make sense locally (except may
be ctid clubbed with foreign server_id) to make it more clear. But I
will leave that for the commiter to decide.


OK.


3. Since there is already a testcase which triggered this particular
change, it will good, if we add that to regression in postgres_fdw.



Done.



I think, a better case would be SELECT * FROM ft1 t1, pg_class t2 WHERE
t1.tableoid = t2.oid. The condition makes sure that the tableoid in the
row is same as the OID of the foreign table recorded in pg_class
locally. And the EXPLAIN of the query which clearly shows that the
tableoid column in not fetched from the foreign server.


I thought that test, but I didn't use it because I think we can't see 
(at least from the EXPLAIN) why the qual is not pushed down: the qual 
isn't pushed down possibly becasue the qual is considered as a *join* 
qual, not because the qual just cotains tableoid.  (Having said that, I 
think we can see if the qual isn't pushed down as a join qual for a 
parameterized plan, but ISTM it's worth complicating regression tests.)



Once we resolve the other patch on this thread,  I think this item can
be marked as ready for commiter from my side.


OK.  Thank you for the review.

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw behaves oddly

2014-11-18 Thread Ashutosh Bapat

On Wed, Nov 19, 2014 at 11:18 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
 wrote:

 (2014/11/18 18:27), Ashutosh Bapat wrote:

 On Tue, Nov 18, 2014 at 1:50 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:
 (2014/11/17 19:36), Ashutosh Bapat wrote:


  Here are my comments about the patch fscan_reltargetlist.patch


  2. Instead of using rel-reltargetlist, we should use the tlist
 passed
 by caller. This is the tlist expected from the Plan node. For
 foreign
 scans it will be same as rel-reltargetlist. Using tlist would
 make the
 function consistent with create_*scan_plan functions.


  I disagree with that for the reasons mentioned below:

 * For a foreign scan, tlist is *not* necessarily the same as
 rel-reltargetlist (ie, there is a case where tlist contains all
 user attributes while rel-reltargetlist contains only attributes
 actually needed by the query).  In such a case it'd be inefficient
 to use tlist rather than rel-reltargetlist.


  create_foreignscan_plan() is called from create_scan_plan(), which
 passes the tlist. The later uses function use_physical_tlist() to decide
 which targetlist should be used (physical or actual). As per code below
 in this function

   485 /*
   486  * We can do this for real relation scans, subquery scans,
 function scans,
   487  * values scans, and CTE scans (but not for, eg, joins).
   488  */
   489 if (rel-rtekind != RTE_RELATION 
   490 rel-rtekind != RTE_SUBQUERY 
   491 rel-rtekind != RTE_FUNCTION 
   492 rel-rtekind != RTE_VALUES 
   493 rel-rtekind != RTE_CTE)
   494 return false;
   495
   496 /*
   497  * Can't do it with inheritance cases either (mainly because
 Append
   498  * doesn't project).
   499  */
   500 if (rel-reloptkind != RELOPT_BASEREL)
   501 return false;

 For foreign tables as well as the tables under inheritance hierarchy it
 uses the actual targetlist, which contains only the required columns IOW
 rel-reltargetlist (see build_path_tlist()) with nested loop parameters
 substituted. So, it never included unnecessary columns in the
 targetlist.


 Maybe I'm missing something, but I think you are overlooking the case for
 foreign tables that are *not* under an inheritance hierarchy.  (Note that
 the rtekind for foreign tables is RTE_RELATION.)



Ah! you are right. I confused between rtekind and relkind. Sorry for that.
May be we should modify use_physical_tlist() to return false in case of
RELKIND_FOREIGN_TABLE, so that we can use tlist in
create_foreignscan_plan(). I do not see any create_*_plan() function using
reltargetlist directly.


 Thanks,

 Best regards,
 Etsuro Fujita




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] psql \watch always ignores \pset null

2014-11-18 Thread Will Leinweber

On Tue, Nov 18, 2014 at 9:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Fujii Masao masao.fu...@gmail.com writes:
  Hi,
  /*
   * Set up rendering options, in particular, disable the pager, because
   * nobody wants to be prompted while watching the output of 'watch'.
   */
  myopt.nullPrint = NULL;
  myopt.topt.pager = 0;

  I found psql's \watch command always ignores \pset null setting.
  The above source code comment explains why it does,
  but I'd like to see the specified string for null value even in \watch's
  output, in order to distinguish null and an empty value. Thought?
  Is there any reason why \watch must ignore \pset null setting?

 Hmmm ... the comment offers a reasonable argument for forcing pager = 0,
 but I agree the nullPrint change is not adequately explained.
 Will, do you remember why you did that?

 regards, tom lane


I tracked down the individual commit[1] from my history where I added
that. What I added there is very similar to sections in
src/bin/psql/describe.c. I can't remember specifically my reasoning
then, but it's likely I copied the patterns there while getting things
working.

I do still think it's important to remove the pager, but the nullPrint
is probably a mistake.

[1]: 
https://github.com/will/postgres/commit/c42d29fece16ec9cb13c159b3307ab9fca892eb2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \watch always ignores \pset null

Will Leinweber w...@heroku.com writes:
 On Tue, Nov 18, 2014 at 9:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 Is there any reason why \watch must ignore \pset null setting?

 Hmmm ... the comment offers a reasonable argument for forcing pager = 0,
 but I agree the nullPrint change is not adequately explained.
 Will, do you remember why you did that?

 I tracked down the individual commit[1] from my history where I added
 that. What I added there is very similar to sections in
 src/bin/psql/describe.c. I can't remember specifically my reasoning
 then, but it's likely I copied the patterns there while getting things
 working.
 I do still think it's important to remove the pager, but the nullPrint
 is probably a mistake.

I took a quick look and noted that the other places where nullPrint is
summarily forced to null are for \d and similar queries.  For those,
the code can reasonably have an opinion about what the presentation should
be like, since it knows what SQL query it's issuing.  That argument surely
doesn't apply to \watch, so I'm in agreement with Fujii that it'd be
better to respect the user's \pset setting.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw behaves oddly


(2014/11/19 14:58), Ashutosh Bapat wrote:

On Wed, Nov 19, 2014 at 11:18 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:
(2014/11/18 18:27), Ashutosh Bapat wrote:
On Tue, Nov 18, 2014 at 1:50 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp
 (2014/11/17 19:36), Ashutosh Bapat wrote:



 Here are my comments about the patch
fscan_reltargetlist.patch



 2. Instead of using rel-reltargetlist, we should use
the tlist
 passed
 by caller. This is the tlist expected from the Plan
node. For
 foreign
 scans it will be same as rel-reltargetlist. Using
tlist would
 make the
 function consistent with create_*scan_plan functions.



 I disagree with that for the reasons mentioned below:

 * For a foreign scan, tlist is *not* necessarily the same as
 rel-reltargetlist (ie, there is a case where tlist
contains all
 user attributes while rel-reltargetlist contains only
attributes
 actually needed by the query).  In such a case it'd be
inefficient
 to use tlist rather than rel-reltargetlist.



create_foreignscan_plan() is called from create_scan_plan(), which
passes the tlist. The later uses function use_physical_tlist()
to decide
which targetlist should be used (physical or actual). As per
code below
in this function

   485 /*
   486  * We can do this for real relation scans, subquery
scans,
function scans,
   487  * values scans, and CTE scans (but not for, eg, joins).
   488  */
   489 if (rel-rtekind != RTE_RELATION 
   490 rel-rtekind != RTE_SUBQUERY 
   491 rel-rtekind != RTE_FUNCTION 
   492 rel-rtekind != RTE_VALUES 
   493 rel-rtekind != RTE_CTE)
   494 return false;
   495
   496 /*
   497  * Can't do it with inheritance cases either (mainly
because
Append
   498  * doesn't project).
   499  */
   500 if (rel-reloptkind != RELOPT_BASEREL)
   501 return false;

For foreign tables as well as the tables under inheritance
hierarchy it
uses the actual targetlist, which contains only the required
columns IOW
rel-reltargetlist (see build_path_tlist()) with nested loop
parameters
substituted. So, it never included unnecessary columns in the
targetlist.



Maybe I'm missing something, but I think you are overlooking the
case for foreign tables that are *not* under an inheritance
hierarchy.  (Note that the rtekind for foreign tables is RTE_RELATION.)



Ah! you are right. I confused between rtekind and relkind. Sorry for
that. May be we should modify use_physical_tlist() to return false in
case of RELKIND_FOREIGN_TABLE, so that we can use tlist in
create_foreignscan_plan(). I do not see any create_*_plan() function
using reltargetlist directly.


Yeah, I think we can do that, but I'm not sure that we should use tlist 
in create_foreignscan_plan(), not rel-reltargetlist.  How about leaving 
this for committers to decide.


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw behaves oddly

2014-11-18 Thread Ashutosh Bapat

On Wed, Nov 19, 2014 at 12:14 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
 wrote:

 (2014/11/19 14:58), Ashutosh Bapat wrote:

 On Wed, Nov 19, 2014 at 11:18 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:
 (2014/11/18 18:27), Ashutosh Bapat wrote:
 On Tue, Nov 18, 2014 at 1:50 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp
  (2014/11/17 19:36), Ashutosh Bapat wrote:


   Here are my comments about the patch
 fscan_reltargetlist.patch


   2. Instead of using rel-reltargetlist, we should use
 the tlist
  passed
  by caller. This is the tlist expected from the Plan
 node. For
  foreign
  scans it will be same as rel-reltargetlist. Using
 tlist would
  make the
  function consistent with create_*scan_plan functions.


   I disagree with that for the reasons mentioned below:

  * For a foreign scan, tlist is *not* necessarily the same as
  rel-reltargetlist (ie, there is a case where tlist
 contains all
  user attributes while rel-reltargetlist contains only
 attributes
  actually needed by the query).  In such a case it'd be
 inefficient
  to use tlist rather than rel-reltargetlist.


  create_foreignscan_plan() is called from create_scan_plan(), which
 passes the tlist. The later uses function use_physical_tlist()
 to decide
 which targetlist should be used (physical or actual). As per
 code below
 in this function

485 /*
486  * We can do this for real relation scans, subquery
 scans,
 function scans,
487  * values scans, and CTE scans (but not for, eg,
 joins).
488  */
489 if (rel-rtekind != RTE_RELATION 
490 rel-rtekind != RTE_SUBQUERY 
491 rel-rtekind != RTE_FUNCTION 
492 rel-rtekind != RTE_VALUES 
493 rel-rtekind != RTE_CTE)
494 return false;
495
496 /*
497  * Can't do it with inheritance cases either (mainly
 because
 Append
498  * doesn't project).
499  */
500 if (rel-reloptkind != RELOPT_BASEREL)
501 return false;

 For foreign tables as well as the tables under inheritance
 hierarchy it
 uses the actual targetlist, which contains only the required
 columns IOW
 rel-reltargetlist (see build_path_tlist()) with nested loop
 parameters
 substituted. So, it never included unnecessary columns in the
 targetlist.


  Maybe I'm missing something, but I think you are overlooking the
 case for foreign tables that are *not* under an inheritance
 hierarchy.  (Note that the rtekind for foreign tables is
 RTE_RELATION.)


  Ah! you are right. I confused between rtekind and relkind. Sorry for
 that. May be we should modify use_physical_tlist() to return false in
 case of RELKIND_FOREIGN_TABLE, so that we can use tlist in
 create_foreignscan_plan(). I do not see any create_*_plan() function
 using reltargetlist directly.


 Yeah, I think we can do that, but I'm not sure that we should use tlist in
 create_foreignscan_plan(), not rel-reltargetlist.  How about leaving this
 for committers to decide.


I am fine with that. May be you want to add an XXX comment there to bring
it to the committer's notice.



 Thanks,

 Best regards,
 Etsuro Fujita




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] postgres_fdw behaves oddly


(2014/11/19 14:55), Etsuro Fujita wrote:

(2014/11/18 18:37), Ashutosh Bapat wrote:

On Tue, Nov 18, 2014 at 1:55 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp mailto:fujita.ets...@lab.ntt.co.jp wrote:
(2014/11/17 19:54), Ashutosh Bapat wrote:



Here are comments for postgres_fdw-syscol patch.



3. Since there is already a testcase which triggered this
particular
change, it will good, if we add that to regression in
postgres_fdw.



Done.



I think, a better case would be SELECT * FROM ft1 t1, pg_class t2 WHERE
t1.tableoid = t2.oid. The condition makes sure that the tableoid in the
row is same as the OID of the foreign table recorded in pg_class
locally. And the EXPLAIN of the query which clearly shows that the
tableoid column in not fetched from the foreign server.


I thought that test, but I didn't use it because I think we can't see
(at least from the EXPLAIN) why the qual is not pushed down: the qual
isn't pushed down possibly becasue the qual is considered as a *join*
qual, not because the qual just cotains tableoid.  (Having said that, I
think we can see if the qual isn't pushed down as a join qual for a
parameterized plan, but ISTM it's worth complicating regression tests.)


Sorry, I incorrectly wrote the last sentence.  What I tried to write is, 
ISTM it's *not* worth complicating regression tests.


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw behaves oddly