Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-21 Thread Craig Ringer
On 21 November 2016 at 00:08, Mithun Cy  wrote:

>> Please avoid adding another round trip by using a GUC_REPORTed variable
>> (ParameterStatus entry).  If you want to support this libpq failover with
>> >pre-10 servers, you can switch the method of determining the primary based
>> on the server version.  But I don't think it's worth supporting older
>> servers > at the price of libpq code complexity.
>
> Currently there is no consensus around this. For now, making this patch to
> address failover to next primary as similar to JDBC seems sufficient for me.
> On next proposal of patch I think we can try to extend as you have proposed

FWIW, I agree. Working first, then improved.

>> I haven't tracked the progress of logical replication, but will
>> target_server_type be likely to be usable with it?  How will
>> target_server_type fit logical   > replication?
>
> I tried to check logical replication WIP patch, not very sure how to
> accomodate same.

Logical replication downstreams won't force transactions to read-only.
A significant part of the appeal of logical replication is that you
can use temp tables and unlogged tables on the downstream, and even
use persistent tables so long as they don't clash with the upstream.
More sophisticated models even allow the downstream to safely write to
replicated tables by doing conflict resolution.

So as it stands, this patch would consider any logical replication
downstream a 'master'.

That's fine for now IMO. Determining whether a server is a logical
replica isn't that simple, nor is it a clear-cut yes/no answer, and I
think it's out of the scope of this patch to deal with it. Figure it
out once logical replication lands.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-21 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> I am very strict about regressing the performance of things that we already
> have, but I try not to make a policy that a new feature must be as fast
> as it could ever be.  That could result in us having very few new features.

I see.  I like your attitude toward new features.  But I don't think now is the 
time to compromise this feature and rush to the commit.


> Also, I am not saying that we should not change this in time for v10.
> I'm saying that I don't think it should be a requirement for the next patch
> to be committed in this area to introduce a whole new mechanism for
> determining whether something is a master or a standby.  Love it or hate
> it, pgsql-jdbc has already implemented something in this area and it does
> something useful -- without requiring a wire protocol change.  Now you and
> Kevin are trying to say that what they did is all wrong, but I don't agree.
> There are very many users for whom the pgsql-jdbc approach will do exactly
> what they need, and no doubt some for whom it won't.  Getting a patch that
> mimics that approach committed is *progress*.  Improving it afterwards,
> whether for v10 or some later release, is also good.

transaction_read_only=on does not mean the standby.  As the manual article on 
hot standby says, they are different.

https://www.postgresql.org/docs/devel/static/hot-standby.html

[Excerpt]
--
In normal operation, “read-only” transactions are allowed to update sequences 
and to use LISTEN, UNLISTEN, and NOTIFY, so Hot Standby sessions operate under 
slightly tighter restrictions than ordinary read-only sessions. It is possible 
that some of these restrictions might be loosened in a future release.
...
Users will be able to tell whether their session is read-only by issuing SHOW 
transaction_read_only. In addition, a set of functions (Table 9.79, “Recovery 
Information Functions”) allow users to access information about the standby 
server. These allow you to write programs that are aware of the current state 
of the database. These can be used to monitor the progress of recovery, or to 
allow you to write complex programs that restore the database to particular 
states.
--


I'm afraid that if the current patch is committed, you will lose interest in 
the ideal solution.  Then if the current patch is out as v10, there would be a 
concern about incompatibility when we pursue the ideal solution in a later 
release.  That is, "should we continue to report that this server is standby 
even if it's actually a primary with transaction_read_only is on, to maintain 
compatibility with the older release."

If you want to connect to a server where the transaction is read-only, then 
shouldn't the connection parameter be something like 
"target_session_attrs=readonly"?  That represents exactly what the code does.


> There is a saying that one should not let the perfect be the enemy of the
> good.  I believe that saying applies here.

True, so I suggested not including the support for older servers for a while.  
Shall we find the real enemy -- what's preventing the ideal solution?  I know 
my knowledge is still far less than you, so I may be missing something 
difficult.  So, I'd like Mithun to share the difficulty.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-11-21 Thread Craig Ringer
On 22 November 2016 at 15:14, Haribabu Kommi  wrote:
>
> On Fri, Nov 18, 2016 at 7:18 PM, Craig Ringer  wrote:
>>
>> The latest is what's attached upthread and what's in the git repo also
>> referenced upthread.
>>
>> I haven't been able to update in response to more recent review due to
>> other development commitments. At this point I doubt I'll be able to
>> get update it again in time for v10, so anyone who wants to adopt it
>> is welcome.
>
>
> Currently patch status is marked as "returned with feedback" in the 11-2016
> commitfest. Anyone who wants to work on it can submit the updated patch
> by taking care of all review comments and change the status of the patch
> at any time.
>
> Thanks for the patch.

Thanks. Sorry I haven't had time to work on it. Priorities.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : For Auto-Prewarm.

2016-11-21 Thread Haribabu Kommi
Hi Ashutosh,


This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
But you haven't shared your review yet. Please share your views about
the patch. This will help us in smoother operation of commitfest.

Please Ignore if you already shared your review.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2016-11-21 Thread Haribabu Kommi
On Fri, Nov 18, 2016 at 7:18 PM, Craig Ringer  wrote:

> The latest is what's attached upthread and what's in the git repo also
> referenced upthread.
>
> I haven't been able to update in response to more recent review due to
> other development commitments. At this point I doubt I'll be able to
> get update it again in time for v10, so anyone who wants to adopt it
> is welcome.
>

Currently patch status is marked as "returned with feedback" in the 11-2016
commitfest. Anyone who wants to work on it can submit the updated patch
by taking care of all review comments and change the status of the patch
at any time.

Thanks for the patch.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] PoC: Partial sort

2016-11-21 Thread Haribabu Kommi
Hi Peter,


This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
But you haven't shared your review yet in this commitfest on the latest
patch posted by the author. If you don't have any comments on the patch,
please move the patch into "ready for committer" state to get committer's
attention. This will help us in smoother operation of commitfest.

Please Ignore if you already shared your review.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-21 Thread Etsuro Fujita

On 2016/11/21 22:02, Ashutosh Bapat wrote:

You wrote:

Instead we should calculate tlist, the
first time we need it and then add it to the fpinfo.


I wrote:

Having said that, I agree on that point.  I'd like to propose (1) adding a
new member to fpinfo, to store a list of output Vars from the subquery, and
(2) creating a tlist from it in deparseRangeTblRef, then, which would allow
us to calculate the tlist only when we need it.  The member added to fpinfo
would be also useful to address the comments on the DML/UPDATE pushdown
patch.  See the attached patch in [1].  I named the member a bit differently
in that patch, though.



Again the list of Vars will be wasted if we don't choose that path for
final planning. So, I don't see the point of adding list of Vars
there.



If you think that the member is useful for DML/UDPATE pushdown,
you may want to add it in the other patch.


OK, I'd like to propose referencing to foreignrel->reltarget->exprs 
directly in deparseRangeTblRef and get_subselect_alias_id, then, which 
is the same as what I proposed in the first version of the patch, but 
I'd also like to change deparseRangeTblRef to use add_to_flat_tlist, not 
make_tlist_from_pathtarget, to create a tlist of the subquery, as you 
proposed.



You modified the comments I added to deparseLockingClause into this:

/*
+* Ignore relation if it appears in a lower subquery. Locking clause
+* for such a relation is included in the subquery.
+*/

I don't think the second sentence is 100% correct because a locking clause
isn't always required for such a relation, so I modified the sentence a bit.



I guess, "if required" part was implicit in construct "such a
relation". Your version seems to make it explicit. I am fine with it.


OK, let's leave that for the committer's judge.

Please find attached an updated version of the patch.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 109,114  typedef struct deparse_expr_cxt
--- 109,116 
  /* Handy macro to add relation name qualification */
  #define ADD_REL_QUALIFIER(buf, varno)	\
  		appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno))
+ #define SS_TAB_ALIAS_PREFIX	"s"
+ #define SS_COL_ALIAS_PREFIX	"c"
  
  /*
   * Functions to determine whether an expression can be evaluated safely on
***
*** 167,172  static void appendConditions(List *exprs, deparse_expr_cxt *context);
--- 169,182 
  static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
  	RelOptInfo *joinrel, bool use_alias, List **params_list);
  static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
+ static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
+    RelOptInfo *foreignrel, bool make_subquery,
+    List **params_list);
+ static void appendSubselectAlias(StringInfo buf, int tabno, int ncols);
+ static void get_subselect_alias_id(Var *node, RelOptInfo *foreignrel,
+ 	   int *tabno, int *colno);
+ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, int *tabno,
+ 		int *colno);
  static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
  static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
  static void appendAggOrderBy(List *orderList, List *targetList,
***
*** 990,1001  deparseSelectSql(List *tlist, List **retrieved_attrs, deparse_expr_cxt *context)
  	 */
  	appendStringInfoString(buf, "SELECT ");
  
  	if (foreignrel->reloptkind == RELOPT_JOINREL ||
! 		foreignrel->reloptkind == RELOPT_UPPER_REL)
! 	{
! 		/* For a join relation use the input tlist */
  		deparseExplicitTargetList(tlist, retrieved_attrs, context);
- 	}
  	else
  	{
  		/*
--- 1000,1015 
  	 */
  	appendStringInfoString(buf, "SELECT ");
  
+ 	/*
+ 	 * For a join relation or an upper relation, use deparseExplicitTargetList.
+ 	 * Likewise, for a base relation that is being deparsed as a subquery, in
+ 	 * which case the caller would have passed non-NIL tlist, use that
+ 	 * function. Otherwise, use deparseTargetList.
+ 	 */
  	if (foreignrel->reloptkind == RELOPT_JOINREL ||
! 		foreignrel->reloptkind == RELOPT_UPPER_REL ||
! 		tlist != NIL)
  		deparseExplicitTargetList(tlist, retrieved_attrs, context);
  	else
  	{
  		/*
***
*** 1155,1165  deparseLockingClause(deparse_expr_cxt *context)
--- 1169,1187 
  	StringInfo	buf = context->buf;
  	PlannerInfo *root = context->root;
  	RelOptInfo *rel = context->scanrel;
+ 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
  	int			relid = -1;
  
  	while ((relid = bms_next_member(rel->relids, relid)) >= 0)
  	{
  		/*
+ 		 * Ignore relation if it appears in a lower subquery. Locking clause
+ 		 * for such a relation, if needed, is included in the subquery.
+ 		 */
+ 		if (bms_is_member(relid, fpinfo->subquery_rels))
+ 			continue;
+ 
+ 		/*
  		 * Add FOR UPDATE/SHARE if appropriate.  We apply locking

Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-11-21 Thread Haribabu Kommi
Hi Vinayak,

This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
If you have any more review comments / performance results regarding the
approach of the patch, please share it. Otherwise, you can mark the patch
as "Ready for committer" for committer feedback. This will help us in
smoother
operation of the commitfest.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-21 Thread Tsunakawa, Takayuki
From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> The point I was trying to make is that I think the forced-removal behavior
> is not desirable, and therefore committing a patch that makes it be graven
> in stone is not desirable either.

I totally agree that we should pursue the direction for escaping from the 
complete loss of stats files.  Personally, I would like to combine that with 
the idea of persistent performance diagnosis information for long-term analysis 
(IIRC, someone proposed it.)  However, I don't think my patch will make 
everyone forget about the problem of stats file loss during recovery.  The 
problem exists with or without my patch, and my patch doesn't have the power to 
delute the importance of the problem.  If you are worried about memory, we can 
add an entry for the problem in TODO list that Bruce-san is maintaining.

Or, maybe we can just stop removing the stats files during recovery by keeping 
the files of previous generation and using it as the current one.  I haven't 
seen how fresh the previous generation is (500ms ago?).  A bit older might be 
better than nothing.

> The larger picture here is that Takayuki-san wants us to commit a patch
> based on a customer's objection to 9.2's behavior, without any real evidence
> that the 9.4 change isn't a sufficient solution.  I've got absolutely zero
> sympathy for that "the stats collector might be stuck in an unkillable state"
> argument --- where's the evidence that the stats collector is any more prone
> to that than any other postmaster child?

9.4 change may be sufficient.  But I don't think I can proudly explain the 
logic to a really severe customer.  I can't answer the question "Why does 
PostgreSQL write files that will be deleted, even during 'immediate' shutdown?  
Why does PostgreSQL use 5 seconds for nothing?"

Other children do nothing and exit immediately.  I believe they are behaving 
correctly.

> And for that matter, if we are stuck because of a nonresponding NFS server,
> how is a quicker postmaster exit going to help anything?
> You're not going to be able to start a new postmaster if the data directory
> is on a nonresponsive server.

NFS server can also be configured for HA, and the new postmaster can start as 
soon as the NFS server completes failover.

> I'd be willing to entertain a proposal to make the 5-second limit adjustable,
> but I don't think we need entirely new behavior here.

Then, I'm at a loss what to do for the 9.2 user.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2016-11-21 Thread Craig Ringer
On 22 November 2016 at 10:20, Craig Ringer  wrote:

> I'm currently looking at making detection of replay conflict with a
> slot work by separating the current catalog_xmin into two effective
> parts - the catalog_xmin currently needed by any known slots
> (ProcArray->replication_slot_catalog_xmin, as now), and the oldest
> actually valid catalog_xmin where we know we haven't removed anything
> yet.

OK, more detailed plan.

The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid,
are already held down by ProcArray's catalog_xmin. But that doesn't
mean we haven't removed newer tuples from specific relations and
logged that in xl_heap_clean, etc, including catalogs or user
catalogs, it only means the clog still exists for those XIDs. We don't
emit a WAL record when we advance oldestXid in
SetTransactionIdLimit(), and doing so is useless because vacuum will
have already removed needed tuples from needed catalogs before calling
SetTransactionIdLimit() from vac_truncate_clog(). We know that if
oldestXid is n, the true valid catalog_xmin where no needed tuples
have been removed must be >= n. But we need to know the lower bound of
valid catalog_xmin, which oldestXid doesn't give us.

So right now a standby has no way to reliably know if the catalog_xmin
requirement for a given replication slot can be satisfied. A standby
can't tell based on a xl_heap_cleanup_info record, xl_heap_clean
record, etc whether the affected table is a catalog or not, and
shouldn't generate conflicts for non-catalogs since otherwise it'll be
constantly clobbering walsenders.

A 2-phase advance of the global catalog_xmin would mean that
GetOldestXmin() would return a value from ShmemVariableCache, not the
oldest catalog xmin from ProcArray like it does now. (auto)vacuum
would then be responsible for:

* Reading the oldest catalog_xmin from procarray
* If it has advanced vs what's present in ShmemVariableCache, writing
a new xlog record type recording an advance of oldest catalog xmin
* advancing ShmemVariableCache's oldest catalog xmin

and would do so before it called GetOldestXmin via
vacuum_set_xid_limits() to determine what it can remove.

GetOldestXmin would return the ProcArray's copy of the oldest
catalog_xmin when in recovery, so we report it via hot_standby_fedback
to the upstream, it's recorded on our physical slot, and in turn
causes vacuum to advance the master's effective oldest catalog_xmin
for vacuum.

On the standby we'd replay the catalog_xmin advance record, advance
the standby's ShmemVariableCache's oldest catalog xmin, and check to
see if any replication slots, active or not, have a catalog_xmin <
than the new threshold. If none do, there's no conflict and we're
fine. If any do, we wait
max_standby_streaming_delay/max_standby_archive_delay as appropriate,
then generate recovery conflicts against all backends that have an
active replication slot based on the replication slot state in shmem.
Those backends - walsender or normal decoding backend - would promptly
die. New decoding sessions will check the ShmemVariableCache and
refuse to start if their catalog_xmin is < the threshold. Since we
advance it before generating recovery conflicts there's no race with
clients trying to reconnect after their backend is killed with a
conflict.

If we wanted to get fancy we could set the latches of walsender
backends at risk of conflicting and they could check
ShmemVariableCache's oldest valid catalog xmin, so they could send
immediate keepalives with reply_requested set and hopefully get flush
confirmation from the client and advance their catalog_xmin before we
terminate them as conflicting with recovery. But that can IMO be done
later/separately.

Going to prototype this.



Alternate approach:
---

The oldest xid in heap_xlog_cleanup_info is relation-specific, but the
standby has no way to know if it's a catalog relation or not during
redo and know whether to kill slots and decoding sessions based on its
latestRemovedXid. Same for xl_heap_clean and the other records that
can cause snapshot conflicts (xl_xlog_visible, xl_heap_freeze_page,
xl_btree_delete xl_btree_reuse_page, spgxlogVacuumRedirect).

Instead of adding a 2-phase advance of the global catalog_xmin, we can
instead add a rider to each of these records that identifies whether
it's a catalog table or not. This would only be emitted when wal_level
>= logical, but it *would* increase WAL sizes a bit when logical
decoding is enabled even if it's not going to be used on a standby.
The rider would be a simple:

typedef struct xl_rel_catalog_info
{
bool rel_accessible_from_logical_decoding;
} xl_catalog_info;

or similar. During redo we call a new
ResolveRecoveryConflictWithLogicalSlot function from each of those
records' redo routines that does what I outlined above.

This way add more info to more xlog records, and the upstream has to
use RelationIsAccessibleInLogicalDecoding() to set up the records when
writing the xlogs. In exchange, we don't h

Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-11-21 Thread Etsuro Fujita

On 2016/11/22 4:49, Tom Lane wrote:

Etsuro Fujita  writes:

On 2016/11/10 5:17, Tom Lane wrote:

I think there's a very good argument that we should just treat any inval
in pg_foreign_data_wrapper as a reason to blow away the whole plan cache.
People aren't going to alter FDW-level options often enough to make it
worth tracking things more finely.  Personally I'd put pg_foreign_server
changes in the same boat, though maybe those are changed slightly more
often.  If it's worth doing anything more than that, it would be to note
which plans mention foreign tables at all, and not invalidate those that
don't.  We could do that with, say, a hasForeignTables bool in
PlannedStmt.



I agree on that point.



OK, please update the patch to handle those catalogs that way.


Will do.


That leaves the question of whether it's worth detecting table-level
option changes this way, or whether we should just handle those by forcing
a relcache inval in ATExecGenericOptions, as in Amit's original patch in
<5702298d.4090...@lab.ntt.co.jp>.  I kind of like that approach; that
patch was short and sweet, and it put the cost on the unusual path (ALTER
TABLE) not the common path (every time we create a query plan).



I think it'd be better to detect table-level option changes because that
could allow us to do more; we could avoid invalidating cached plans that
need not to be invalidated, by tracking the plan dependencies more
exactly, ie, avoid collecting the dependencies of foreign tables in a
plan that are in dead subqueries or excluded by constraint exclusion.
The proposed patch doesn't do that, though.  I think updates on
pg_foreign_table would be more frequent, so it might be worth
complicating the code.  But the relcache-inval approach couldn't do
that, IIUC.



Why not?  But the bigger picture here is that relcache inval is the
established practice for forcing replanning due to table-level changes,
and I have very little sympathy for inventing a new mechanism for that
just for foreign tables.  If it were worth bypassing replanning for
changes in tables in dead subqueries, for instance, it would surely be
worth making that happen for all table types not just foreign tables.


My point here is that FDW-option changes wouldn't affect replanning by 
core; even if forcing a replan, we would have the same foreign tables 
excluded by constraint exclusion, for example.  So, considering updates 
on pg_foreign_table to be rather frequent, it might be better to avoid 
replanning for the pg_foreign_table changes to foreign tables excluded 
by constraint exclusion, for example.  My concern about the 
relcache-inval approach is: that approach wouldn't allow us to do that, 
IIUC.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgpassfile connection option

2016-11-21 Thread Mithun Cy
 > Independently of your patch, while testing I concluded that the
multi-host feature and documentation should be improved:
 > - If a connection fails, it does not say for which host/port.

Thanks I will re-test same.

> In fact they are tried in turn if the network connection fails, but not
> if the connection fails for some other reason such as the auth.
> The documentation is not precise enough.

If we fail due to authentication, then I think we should notify the client
instead of trying next host. I think it is expected genuine user have right
credentials for making the connection. So it's better if we notify same to
client immediately rather than silently ignoring them. Otherwise the host
node which the client failed to connect will be permanently unavailable
until client corrects itself. But I agree documentation and error message
as you said need improvements. I will try to correct same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-21 Thread Tsunakawa, Takayuki
From: Robert Haas [mailto:robertmh...@gmail.com]
> So there are two questions here:
> 
> 1. Should we try to avoid having the stats collector write a stats file
> during an immediate shutdown?  The file will be removed anyway during crash
> recovery, so writing it is pointless.  I think you are right that 9.4's
> solution here is not perfect, because of the 5 second delay, and also because
> if the stats collector is stuck inside the kernel trying to write to the
> OS, it may be in a non-interruptible wait state where even SIGKILL has no
> immediate effect.  Anyway, it's stupid even from a performance point of
> view to waste time writing a file that we're just going to nuke.
> 
> 2. Should we close listen sockets sooner during an immediate shutdown?
>  I agree with Tom and Peter that this isn't a good idea.  People expect
> the sockets not to go away until the end - e.g. they use
> PQping() to test the server status, or they connect just to see what error
> they get - and the fact that a client application could hypothetically
> generate such a relentless stream of connection attempts that the dead-end
> backends thereby created slow down shutdown is not in my mind a sufficient
> reason to change the behavior.
> 
> So I think 001 should proceed and 002 should be rejected.

I'm happy with this conclusion, since I think 1 was the cause of slow shutdown, 
and 2 is just a hypothesis to pursue the completeness.  And I can understand 
the concern about PQping().

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13755: pgwin32_is_service not checking if SECURITY_SERVICE_SID is disabled

2016-11-21 Thread Tsunakawa, Takayuki
From: Craig Ringer [mailto:cr...@2ndquadrant.com]
> You meant CheckTokenMembership().

Yes, my typo in the mail.

> The proposed patch does need to be checked with:

I understood you meant by "refuse to run" that postgres.exe fails to start 
below.  Yes, I checked it on Win10.  I don't have access to WinXP/2003 - 
Microsoft ended their support.

if (pgwin32_is_admin())
{
write_stderr("Execution of PostgreSQL by a user with 
administrative permissions is not\n"
 "permitted.\n"
 "The server must be started under an 
unprivileged user ID to prevent\n"
 "possible system security compromises.  See the documentation 
for\n"
  "more information on how to properly start 
the server.\n");
exit(1);
}

Regards
Takayuki Tsunakawa






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-11-21 Thread Pavel Stehule
2016-11-21 21:16 GMT+01:00 Tom Lane :

> Alvaro Herrera  writes:
> > Something I just noticed is that transformTableExpr takes a TableExpr
> > node and returns another TableExpr node.  That's unlike what we do in
> > other places, where the node returned is of a different type than the
> > input node.  I'm not real clear what happens if you try to re-transform
> > a node that was already transformed, but it seems worth thinking about.
>
> We're not 100% consistent on that --- there are cases such as RowExpr
> and CaseExpr where the same struct type is used for pre-parse-analysis
> and post-parse-analysis nodes.  I think it's okay as long as the
> information content isn't markedly different, ie the transformation
> just consists of transforming all the sub-nodes.
>
> Being able to behave sanely on a re-transformation used to be an
> issue, but we no longer expect transformExpr to support that.
>

I was not sure in this case - using new node was more clear for me -
safeguard against some uninitialized or untransformed value. There in only
few bytes memory more overhead.

regards

Pavel


>
> regards, tom lane
>


Re: [HACKERS] condition variables

2016-11-21 Thread Thomas Munro
On Tue, Nov 22, 2016 at 3:05 PM, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Mon, 21 Nov 2016 15:57:47 -0500, Robert Haas  wrote 
> in 
>> On Thu, Aug 11, 2016 at 5:47 PM, Robert Haas  wrote:
>> > So, in my
>> > implementation, a condition variable wait loop looks like this:
>> >
>> > for (;;)
>> > {
>> > ConditionVariablePrepareToSleep(cv);
>> > if (condition for which we are waiting is satisfied)
>> > break;
>> > ConditionVariableSleep();
>> > }
>> > ConditionVariableCancelSleep();
>>
>> I have what I think is a better idea.  Let's get rid of
>> ConditionVariablePrepareToSleep(cv) and instead tell users of this
>> facility to write the loop this way:
>>
>> for (;;)
>> {
>> if (condition for which we are waiting is satisfied)
>> break;
>> ConditionVariableSleep(cv);
>> }
>> ConditionVariableCancelSleep();
>
> It seems rather a common way to wait on a condition variable, in
> shorter,
>
> | while (condition for which we are waiting is *not* satisfied)
> | ConditionVariableSleep(cv);
> | ConditionVariableCancelSleep();

Ok, let's show it that way.

>> ConditionVariableSleep(cv) will check whether the current process is
>> already on the condition variable's waitlist.  If so, it will sleep;
>> if not, it will add the process and return without sleeping.
>>
>> It may seem odd that ConditionVariableSleep(cv) doesn't necessary
>> sleep, but this design has a significant advantage: we avoid
>> manipulating the wait-list altogether in the case where the condition
>> is already satisfied when we enter the loop.  That's more like what we
>
> The condition check is done far faster than maintaining the
> wait-list for most cases, I believe.
>
>> already do in lwlock.c: we try to grab the lock first; if we can't, we
>> add ourselves to the wait-list and retry; if we then get the lock
>> after all we have to recheck whether we can get the lock and remove
>> ourselves from the wait-list if so.  Of course, there is some cost: if
>> we do have to wait, we'll end up checking the condition twice before
>> actually going to sleep.  However, it's probably smart to bet that
>> actually needing to sleep is fairly infrequent, just as in lwlock.c.
>>
>> Thoughts?
>
> FWIW, I agree to the assumption.

Here's a version that works that way, though it allows you to call
ConditionVariablePrepareToSleep *optionally* before you enter your
loop, in case you expect to have to wait and would rather avoid the
extra loop.  Maybe there isn't much point in exposing that though,
since your condition test should be fast and waiting is the slow path,
but we don't really really know what your condition test is.  I
thought about that because my use case (barrier.c) does in fact expect
to hit the wait case more often than not.  If that seems pointless
then perhaps ConditionVariablePrepareToSleep should become static and
implicit.  This version does attempt to suppress spurious returns, a
bit, using proclist_contains.  No more cvSleeping.

It's possible that future users will want a version with a timeout, or
multiplexed with IO, in which case there would be some interesting
questions about how this should interact with WaitEventSet.  It also
seems like someone might eventually want to handle postmaster death.
Perhaps there shoul eventually be a way to tell WaitEventSet that
you're waiting for a CV so these things can be multiplexed without
exposing the fact that it's done internally with latches.

-- 
Thomas Munro
http://www.enterprisedb.com


condition-variable-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-11-21 Thread Haribabu Kommi
On Tue, Nov 22, 2016 at 5:33 AM, Robert Haas  wrote:

> On Sat, Nov 19, 2016 at 2:54 PM, Tom Lane  wrote:
> > Stephen Frost  writes:
> >> Let's create a new data type for this which supports old and new types,
> >> add what casts make sense, and call it a day.  Changing the data type
> >> name out from under people is not helping anyone.
> >
> > +1.  I do not think changing behavior for the existing type name is
> > going to be a net win.  If we'd been smart enough to make the type
> > varlena from the get-go, maybe we could get away with it, but there
> > is just way too much risk of trouble with a change in a fixed-length
> > type's on-the-wire representation.
>
> I completely agree.


OK. Agreed.

Any suggestions for the name to be used for the new datatype the can
work for both 48 and 64 bit MAC addresses?

It is possible to represent a 48 bit MAC address as 64 bit MAC address
by adding reserved bytes in the middle as follows.

01-01-01-01-01-01::macaddr => 01-01-01-FF-FE-01-01-01::newmacaddr

While comparing a 48 bit MAC address with 64 bit MAC address, Ignore
the two bytes if the contents in those bytes are reserved bytes.

The new datatype can store directly whatever is the input is, like 48 bit
or 64 bit. The same data is sent over the wire, whether the reserved
bytes are present or not?

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

2016-11-21 Thread Craig Ringer
On 22 November 2016 at 11:35, Kyotaro HORIGUCHI
 wrote:
> Hello,
>
> At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas  wrote 
> in 
>> On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer  wrote:
>> > I'm still interested in hearing comments from experienced folks about
>> > whether it's sane to do this at all, rather than have similar
>> > duplicate signal handling for the walsender.
>>
>> Well, I mean, if it's reasonable to share code in a given situation,
>> that is generally better than NOT sharing code...
>
> Walsender handles SIGUSR1 completely different way from normal
> backends. The procsignal_sigusr1_handler is designed to work as
> the peer of SendProcSignal (not ProcSendSignal:). Walsender is
> just using a latch to be woken up. It has nothing to do with
> SendProcSignal.

Indeed, at the moment it doesn't. I'm proposing to change that, since
walsenders doing logical decoding on a standby need to be notified of
conflicts with recovery, many of which are the same as for normal user
backends and bgworkers.

The main exception is snapshot conflicts, where logical decoding
walsenders don't care about conflicts with specific tables, they want
to be terminated if the effective catalog_xmin on the upstream
increases past their needed catalog_xmin. They don't care about
non-catalog (or non-heap-catalog) tables. So I expect we'd just want
to ignore PROCSIG_RECOVERY_CONFLICT_SNAPSHOT when running a walsender
and handle conflict with catalog_xmin increases separately.

> IMHO, I don't think walsender is allowed to just share the
> backends' handler for a practical reason that pss_signalFlags can
> harm.

I'm not sure I see the problem. The ProcSignalReason argument to
RecoveryConflictInterrupt states that:

 * Also, because of race conditions, it's important that all the signals be
 * defined so that no harm is done if a process mistakenly receives one.

> If you need to expand the function of SIGUSR1 of walsender, more
> thought would be needed.

Yeah, I definitely don't think it's as simple as just using
procsignal_sigusr1_handler as-is. I expect we'd likely create a new
global IsWalSender and ignore some RecoveryConflictInterrupt cases
when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and
probably add a new case for catalog_xmin conflicts that's only acted
on when IsWalSender.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Danger of automatic connection reset in psql

2016-11-21 Thread Pavel Stehule
2016-11-22 3:46 GMT+01:00 Robert Haas :

> On Mon, Nov 21, 2016 at 4:55 AM, Oleksandr Shulgin
>  wrote:
> > On Tue, Nov 15, 2016 at 4:10 PM, Jim Nasby 
> wrote:
> >>
> >> On 11/14/16 5:41 AM, Oleksandr Shulgin wrote:
> >>>
> >>> Automatic connection reset is a nice feature for server development,
> >>> IMO.  Is it really useful for anything else is a good question.
> >>
> >>
> >> I use it all the time for application development; my rebuild script
> will
> >> forcibly kick everyone out to re-create the database. I put that in
> because
> >> I invariably end up with a random psql sitting somewhere that I don't
> want
> >> to track down.
> >>
> >> What currently stinks though is if the connection is dead and the next
> >> command I run is a \i, psql just dies instead of re-connecting. It'd be
> nice
> >> if before reading the script it checked connection status and attempted
> a
> >> reconnect.
> >>
> >>> At least an option to control that behavior seems like a good idea,
> >>> maybe even set it to 'no reconnect' by default, so that people who
> >>> really use it can make conscious choice about enabling it in their
> >>> .psqlrc or elsewhere.
> >>
> >>
> >> +1, I don't think it needs to be the default.
> >
> >
> > So if we go in this direction, should the option be specified from
> command
> > line or available via psqlrc (or both?)  I think both make sense.
> >
> > What should be the option and control variable names?  Something like:
> > --reconnect and RECONNECT?  Should we allow reconnect in non-interactive
> > mode?  I have no use case for that, but it might be different for others.
> > If non-interactive is not supported then it could be a simple boolean
> > variable, otherwise we might want something like a tri-state: on / off /
> > interactive (the last one being the default).
>
> I think it should just be another psql special variable, like
> AUTOCOMMIT or VERBOSITY.  If the user wants to set it on the command
> line, they can just use -v.  We don't need a separate, dedicated
> option for this, I think.
>

In this case depends on a default. For almost all scripts the sensible
value is "without reconnect". It be unfriendly to use this setting via -v
variable.

Regards

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] WIP: multivariate statistics / proof of concept

2016-11-21 Thread Tomas Vondra

On 11/21/2016 11:10 PM, Robert Haas wrote:

[ reviving an old multivariate statistics thread ]

On Thu, Nov 13, 2014 at 6:31 AM, Simon Riggs  wrote:

On 12 October 2014 23:00, Tomas Vondra  wrote:


It however seems to be working sufficiently well at this point, enough
to get some useful feedback. So here we go.


This looks interesting and useful.

What I'd like to check before a detailed review is that this has
sufficient applicability to be useful.

My understanding is that Q9 and Q18 of TPC-H have poor plans as a
result of multi-column stats errors.

Could you look at those queries and confirm that this patch can
produce better plans for them?


Tomas, did you ever do any testing in this area?  One of my
colleagues, Rafia Sabih, recently did some testing of TPC-H queries @
20 GB.  Q18 actually doesn't complete at all right now because of an
issue with the new simplehash implementation.  I reported it to Andres
and he tracked it down, but hasn't posted the patch yet - see
http://archives.postgresql.org/message-id/20161115192802.jfbec5s6ougxw...@alap3.anarazel.de

Of the remaining queries, the slowest are Q9 and Q20, and both of them
have serious estimation errors.  On Q9, things go wrong here:

 ->  Merge Join
(cost=5225092.04..6595105.57 rows=154 width=47) (actual
time=103592.821..149335.010 rows=6503988 loops=1)
   Merge Cond:
(partsupp.ps_partkey = lineitem.l_partkey)
   Join Filter:
(lineitem.l_suppkey = partsupp.ps_suppkey)
   Rows Removed by Join Filter: 19511964
   ->  Index Scan using

> [snip]


Rows Removed by Filter: 756627

The estimate for the index scan on partsupp is essentially perfect,
and the lineitem-part join is off by about 3x.  However, the merge
join is off by about 4000x, which is real bad.



The patch only deals with statistics on base relations, no joins, at 
this point. It's meant to be extended in that direction, so the syntax 
supports it, but at this point that's all. No joins.


That being said, this estimate should be improved in 9.6, when you 
create a foreign key between the tables. In fact, that patch was exactly 
about Q9.


This is how the join estimate looks on scale 1 without the FK between 
the two tables:


  QUERY PLAN
---
 Merge Join  (cost=19.19..700980.12 rows=2404 width=261)
   Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND
(lineitem.l_suppkey = partsupp.ps_suppkey))
   ->  Index Scan using idx_lineitem_part_supp on lineitem
(cost=0.43..605856.84 rows=6001117 width=117)
   ->  Index Scan using partsupp_pkey on partsupp
(cost=0.42..61141.76 rows=80 width=144)
(4 rows)


and with the foreign key:

 QUERY PLAN
---
 Merge Join  (cost=19.19..700980.12 rows=6001117 width=261)
 (actual rows=6001215 loops=1)
   Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND
(lineitem.l_suppkey = partsupp.ps_suppkey))
   ->  Index Scan using idx_lineitem_part_supp on lineitem
(cost=0.43..605856.84 rows=6001117 width=117)
(actual rows=6001215 loops=1)
   ->  Index Scan using partsupp_pkey on partsupp
(cost=0.42..61141.76 rows=80 width=144)
(actual rows=6001672 loops=1)
 Planning time: 3.840 ms
 Execution time: 21987.913 ms
(6 rows)



On Q20, things go wrong here:

>

[snip]

The estimate for the GroupAggregate feeding one side of the merge join
is quite accurate.  The estimate for the part-partsupp join on the
other side is off by 8x.  Then things get much worse: the estimate for
the merge join is off by 400x.



Well, most of the estimation error comes from the join, but sadly the 
aggregate makes using the foreign keys impossible - at least in the 
current version. I don't know if it can be improved, somehow.



I'm not really sure whether the multivariate statistics stuff will fix
this kind of case or not, but if it did it would be awesome.



Join statistics are something I'd like to add eventually, but I don't 
see how it could happen in the first version. Also, the patch received 
no reviews this CF, and making it even larger is unlikely to make it 
more attractive.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

2016-11-21 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas  wrote 
in 
> On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer  wrote:
> > I'm still interested in hearing comments from experienced folks about
> > whether it's sane to do this at all, rather than have similar
> > duplicate signal handling for the walsender.
> 
> Well, I mean, if it's reasonable to share code in a given situation,
> that is generally better than NOT sharing code...

Walsender handles SIGUSR1 completely different way from normal
backends. The procsignal_sigusr1_handler is designed to work as
the peer of SendProcSignal (not ProcSendSignal:). Walsender is
just using a latch to be woken up. It has nothing to do with
SendProcSignal.

IMHO, I don't think walsender is allowed to just share the
backends' handler for a practical reason that pss_signalFlags can
harm.

If you need to expand the function of SIGUSR1 of walsender, more
thought would be needed.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2016-11-21 Thread Dilip Kumar
On Fri, Nov 18, 2016 at 9:59 AM, Amit Khandekar  wrote:

Thanks for the review..

> In pbms_is_leader() , I didn't clearly understand the significance of
> the for-loop. If it is a worker, it can call
> ConditionVariablePrepareToSleep() followed by
> ConditionVariableSleep(). Once it comes out of
> ConditionVariableSleep(), isn't it guaranteed that the leader has
> finished the bitmap ? If yes, then it looks like it is not necessary
> to again iterate and go back through the pbminfo->state checking.
> Also, with this, variable queuedSelf also might not be needed. But I
> might be missing something here.

I have taken this logic from example posted on conditional variable thread[1]

for (;;)
{
ConditionVariablePrepareToSleep(cv);
if (condition for which we are waiting is satisfied)
break;
ConditionVariableSleep();
}
ConditionVariableCancelSleep();

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr%3DC56Xng%40mail.gmail.com

So it appears to me even if we come out of ConditionVariableSleep();
we need to verify our condition and then only break.

Not sure what happens if worker calls
> ConditionVariable[Prepare]Sleep() but leader has already called
> ConditionVariableBroadcast(). Does the for loop have something to do
> with this ? But this can happen even with the current for-loop, it
> seems.
If leader has already called ConditionVariableBroadcast, but after
ConditionVariablePrepareToSleep we will check the condition again
before calling ConditionVariableSleep. And condition check is under
SpinLockAcquire(&pbminfo->state_mutex);

However I think there is one problem in my code (I think you might be
pointing same), that after ConditionVariablePrepareToSleep, if
pbminfo->state is already PBM_FINISHED, I am not resetting needWait to
false, and which can lead to the problem.

>
>
>> #3. Bitmap processing (Iterate and process the pages).
>> In this phase each worker will iterate over page and chunk array and
>> select heap pages one by one. If prefetch is enable then there will be
>> two iterator. Since multiple worker are iterating over same page and
>> chunk array we need to have a shared iterator, so we grab a spin lock
>> and iterate within a lock, so that each worker get and different page
>> to process.
>
> tbm_iterate() call under SpinLock :
> For parallel tbm iteration, tbm_iterate() is called while SpinLock is
> held. Generally we try to keep code inside Spinlock call limited to a
> few lines, and that too without occurrence of a function call.
> Although tbm_iterate() code itself looks safe under a spinlock, I was
> checking if we can squeeze SpinlockAcquire() and SpinLockRelease()
> closer to each other. One thought is :  in tbm_iterate(), acquire the
> SpinLock before the while loop that iterates over lossy chunks. Then,
> if both chunk and per-page data remain, release spinlock just before
> returning (the first return stmt). And then just before scanning
> bitmap of an exact page, i.e. just after "if (iterator->spageptr <
> tbm->npages)", save the page handle, increment iterator->spageptr,
> release Spinlock, and then use the saved page handle to iterate over
> the page bitmap.

Main reason to keep Spin lock out of this function to avoid changes
inside this function, and also this function takes local iterator as
input which don't have spin lock reference to it. But that can be
changed, we can pass shared iterator to it.

I will think about this logic and try to update in next version.

>
> prefetch_pages() call under Spinlock :
> Here again, prefetch_pages() is called while pbminfo->prefetch_mutex
> Spinlock is held. Effectively, heavy functions like PrefetchBuffer()
> would get called while under the Spinlock. These can even ereport().
> One option is to use mutex lock for this purpose. But I think that
> would slow things down. Moreover, the complete set of prefetch pages
> would be scanned by a single worker, and others might wait for this
> one. Instead, what I am thinking is: grab the pbminfo->prefetch_mutex
> Spinlock only while incrementing pbminfo->prefetch_pages. The rest
> part viz : iterating over the prefetch pages, and doing the
> PrefetchBuffer() need not be synchronised using this
> pgbinfo->prefetch_mutex Spinlock. pbms_parallel_iterate() already has
> its own iterator spinlock. Only thing is, workers may not do the
> actual PrefetchBuffer() sequentially. One of them might shoot ahead
> and prefetch 3-4 pages while the other is lagging with the
> sequentially lesser page number; but I believe this is fine, as long
> as they all prefetch all the required blocks.

I agree with your point, will try to fix this as well.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

2016-11-21 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila
> I have tried using attached script multiple times on latest 9.2 code, but
> couldn't reproduce the issue.  Please find the log attached with this mail.
> Apart from log file, below prints appear:
> 
> WARNING: enabling "trust" authentication for local connections You can
> change this by editing pg_hba.conf or using the option -A, or --auth-local
> and --auth-host, the next time you run initdb.
> 20075/20075 kB (100%), 1/1 tablespace
> NOTICE:  pg_stop_backup complete, all required WAL segments have been
> archived
> 20079/20079 kB (100%), 1/1 tablespace
> 
> Let me know, if some parameters need to be tweaked to reproduce the issue?
> 
> 
> It seems that the patch proposed is good, but it is better if somebody other
> than you can reproduce the issue and verify if the patch fixes the same.
> 

Thank you for reviewing the code and testing.  Hmm, we could reproduce the 
problem on PostgreSQL 9.2.19.  The script's stdout is attached as test.log, and 
the stderr is as follows:

WARNING: enabling "trust" authentication for local connections You can change 
this by editing pg_hba.conf or using the option -A, or --auth-local and 
--auth-host, the next time you run initdb.
20099/20099 kB (100%), 1/1 tablespace
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
20103/20103 kB (100%), 1/1 tablespace

The sizes pg_basebackup outputs is a bit different from yours.  I don't see a 
reason for this.  The test script explicitly specifies the database encoding 
and locale, so the encoding difference doesn't seem to be the cause.  The 
target problem occurs only when a WAL record crosses a WAL segment boundary, so 
subtle change in WAL record volume would prevent the problem from happening.

Anyway, could you retry with the attached test.sh?  It just changes 
restore_command.

If the problem occurs, the following pair of lines appear in the server log of 
the cascading standby.  Could you check it?

LOG:  restored log file "00020003" from archive
LOG:  out-of-sequence timeline ID 1 (after 2) in log file 0, segment 3, offset 0

Regards
Takayuki Tsunakawa




test.sh
Description: test.sh


test.log
Description: test.log

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-21 Thread Amit Kapila
On Mon, Nov 21, 2016 at 9:30 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila  wrote:
>>> Here instead of checking whether top_plan has initPlan, it should
>>> check whether initPlan is present anywhere in plan tree.  I think one
>>> simple way could be to check *glob->subplans* instead of
>>> top_plan->initPlan,
>
>> Patch based on above suggestion is attached with this mail.
>
> I think this is the right fix for the moment, because the proximate cause
> of the crash is that ExecSerializePlan doesn't transmit any part of the
> PlannedStmt.subplans list, which glob->subplans is the preimage of.
>
> Now, maybe I'm missing something, but it seems to me that ordinary
> subplans could be transmitted to workers just fine.  The problem with
> transmitting initplans is that you'd lose the expectation of single
> evaluation.
>

Yes and I think we can handle it such that master backend evaluates
initplans and share the calculated value along with paramid with all
the workers. Workers will, in turn, restore it in
queryDesc->estate->es_param_exec_vals (or some other place where those
can be directly used, I have yet to evaluate on this matter).  I am
working on a patch to parallelize queries containing
initplans/subplans, so I will evaluate your suggestion of passing
subplans (maybe non-InitPlans) in ExecSerializePlan as part of that
patch.  I have yet to figure out what is the best way to share hashed
subplans, do we pass them as it is and let each worker evaluate and
store it's own copy of hash table or shall we try to form the hash
table once in master and then share the same with workers (which could
be tricky) or shall we restrict such queries which contain hashed
subplans based on assumption that it will be costly for each worker to
form the hash table.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Contains and is contained by operators of inet datatypes

2016-11-21 Thread Robert Haas
On Mon, Nov 21, 2016 at 7:57 AM, Andreas Karlsson  wrote:
> I like the patch because it means less operators to remember for me as a
> PostgreSQL user. And at least for me inet is a rarely used type compared to
> hstore, json and range types which all use @> and <@.

I agree that it would be nice to make the choice of operator names
more consistent.  I don't know if doing so will please more or fewer
people than it annoys.  I do not like this bit from the original post:

EH> The patch removes the recently committed SP-GiST index support for the
EH> existing operator symbols to give move reason to the users to use the
EH> new symbols.

That seems like the rough equivalent of throwing a wrench into the
datacenter's backup generator to "encourage" them to maintain two
separate and independent backup generators.  If we're going to add the
more-standard operator names as synonyms for the existing operator
names, let's do precisely that and no more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-21 Thread Robert Haas
On Fri, Nov 18, 2016 at 3:23 PM, Peter Eisentraut
 wrote:
> On 11/17/16 12:30 AM, Tsunakawa, Takayuki wrote:
>> No, I'm not recommending a higher value, but just removing the doubtful 
>> sentences of 512MB upper limit.  The advantage is that eliminating this 
>> sentence will make a chance for users to try best setting.
>
> I think this is a good point.  The information is clearly
> wrong/outdated.  We have no new better information, but we should remove
> misleading outdated advice and let users find new advice.  Basically,
> this just puts Windows on par with other platforms with regard to
> shared_buffers tuning, doesn't it?
>
> I'm inclined to commit the original patch if there are no objections.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Danger of automatic connection reset in psql

2016-11-21 Thread Robert Haas
On Mon, Nov 21, 2016 at 4:55 AM, Oleksandr Shulgin
 wrote:
> On Tue, Nov 15, 2016 at 4:10 PM, Jim Nasby  wrote:
>>
>> On 11/14/16 5:41 AM, Oleksandr Shulgin wrote:
>>>
>>> Automatic connection reset is a nice feature for server development,
>>> IMO.  Is it really useful for anything else is a good question.
>>
>>
>> I use it all the time for application development; my rebuild script will
>> forcibly kick everyone out to re-create the database. I put that in because
>> I invariably end up with a random psql sitting somewhere that I don't want
>> to track down.
>>
>> What currently stinks though is if the connection is dead and the next
>> command I run is a \i, psql just dies instead of re-connecting. It'd be nice
>> if before reading the script it checked connection status and attempted a
>> reconnect.
>>
>>> At least an option to control that behavior seems like a good idea,
>>> maybe even set it to 'no reconnect' by default, so that people who
>>> really use it can make conscious choice about enabling it in their
>>> .psqlrc or elsewhere.
>>
>>
>> +1, I don't think it needs to be the default.
>
>
> So if we go in this direction, should the option be specified from command
> line or available via psqlrc (or both?)  I think both make sense.
>
> What should be the option and control variable names?  Something like:
> --reconnect and RECONNECT?  Should we allow reconnect in non-interactive
> mode?  I have no use case for that, but it might be different for others.
> If non-interactive is not supported then it could be a simple boolean
> variable, otherwise we might want something like a tri-state: on / off /
> interactive (the last one being the default).

I think it should just be another psql special variable, like
AUTOCOMMIT or VERBOSITY.  If the user wants to set it on the command
line, they can just use -v.  We don't need a separate, dedicated
option for this, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?

2016-11-21 Thread Robert Haas
On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer  wrote:
> I'm still interested in hearing comments from experienced folks about
> whether it's sane to do this at all, rather than have similar
> duplicate signal handling for the walsender.

Well, I mean, if it's reasonable to share code in a given situation,
that is generally better than NOT sharing code...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] condition variables

2016-11-21 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 21 Nov 2016 15:57:47 -0500, Robert Haas  wrote 
in 
> On Thu, Aug 11, 2016 at 5:47 PM, Robert Haas  wrote:
> > So, in my
> > implementation, a condition variable wait loop looks like this:
> >
> > for (;;)
> > {
> > ConditionVariablePrepareToSleep(cv);
> > if (condition for which we are waiting is satisfied)
> > break;
> > ConditionVariableSleep();
> > }
> > ConditionVariableCancelSleep();
> 
> I have what I think is a better idea.  Let's get rid of
> ConditionVariablePrepareToSleep(cv) and instead tell users of this
> facility to write the loop this way:
> 
> for (;;)
> {
> if (condition for which we are waiting is satisfied)
> break;
> ConditionVariableSleep(cv);
> }
> ConditionVariableCancelSleep();

It seems rather a common way to wait on a condition variable, in
shorter,

| while (condition for which we are waiting is *not* satisfied)
| ConditionVariableSleep(cv);
| ConditionVariableCancelSleep();

> ConditionVariableSleep(cv) will check whether the current process is
> already on the condition variable's waitlist.  If so, it will sleep;
> if not, it will add the process and return without sleeping.
> 
> It may seem odd that ConditionVariableSleep(cv) doesn't necessary
> sleep, but this design has a significant advantage: we avoid
> manipulating the wait-list altogether in the case where the condition
> is already satisfied when we enter the loop.  That's more like what we

The condition check is done far faster than maintaining the
wait-list for most cases, I believe.

> already do in lwlock.c: we try to grab the lock first; if we can't, we
> add ourselves to the wait-list and retry; if we then get the lock
> after all we have to recheck whether we can get the lock and remove
> ourselves from the wait-list if so.  Of course, there is some cost: if
> we do have to wait, we'll end up checking the condition twice before
> actually going to sleep.  However, it's probably smart to bet that
> actually needing to sleep is fairly infrequent, just as in lwlock.c.
> 
> Thoughts?

FWIW, I agree to the assumption.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression tests fails

2016-11-21 Thread Tom Lane
Pavel Stehule  writes:
>> 2016-11-21 8:09 GMT+01:00 Craig Ringer :
>>> Simple fix here is to append COLLATE "C" after the ORDER BY.

> here is a patch

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: autovacuum: Drop orphan temp tables more quickly but with more c

2016-11-21 Thread Michael Paquier
On Mon, Nov 21, 2016 at 4:57 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Mon, Nov 21, 2016 at 10:05 AM, Robert Haas  wrote:
>>> autovacuum: Drop orphan temp tables more quickly but with more caution.
>
>> I have found an obvious bug when reading the code this morning:
>> orphan_failures is not initialized:
>
> My compiler noticed that, too.  Will push.

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2016-11-21 Thread Claudio Freire
On Mon, Nov 21, 2016 at 5:42 PM, Peter Geoghegan  wrote:
>>> Or, you might want to make
>>> sure that B-Tree tuple insertions still find that "first page" to
>>> check, despite still generally treating heap item pointer as part of
>>> the key proper (in unique indexes, it can still behave like NULL, and
>>> not participate in uniqueness checking, while still essentially being
>>> part of the key/sort order).
>>
>> It behaves like that (even though it's not coded like that)
>
> Why not? What do you mean?
>
> What I'm interested in evaluating is an implementation where an index
> on (foo) has a sort order/key space that is precisely the same as an
> index on (foo, ctid), with zero exceptions.

Well, the patch does the same sorting, but without explicitly adding
the ctid to the scankey.

When inserting into a unique index, the lookup for the insertion point
proceeds as it would if the key was (foo, ctid), finding a page in the
middle of the range that contain item pointers for foo.

When that happens on a regular index, the insertion is done at that
point and nothing else needs to be done. But when it happens on a
unique index, the code first checks to see if the first item pointer
for foo is on the same page (allegedly a frequent case). If so, the
uniqueness check is done in a very straightforward and efficient
manner. If not, however, the code retraces its steps up the tree to
the first time it followed a key other than foo (that's the only way
it could land at a middle page), and then follows the downlinks
looking at a truncated key (just foo, ignoring ctid).

Kind of what you say, but without the explicit null.

>
>>> It also occurs to me that our performance for updates in the event of
>>> many NULL values in indexes is probably really bad, and could be
>>> helped by this. You'll want to test all of this with a workload that
>>> isn't very sympathetic to HOT, of course -- most of these benefits are
>>> seen when HOT doesn't help.
>>
>> I haven't really tested with an overabundance of NULLs, I'll add that
>> to the tests. I have tested low-cardinality indexes though, but only
>> for correctness.
>
> I think we'll have to model data distribution to evaluate the idea
> well. After all, if there is a uniform distribution of values anyway,
> you're going to end up with many dirty leaf pages. Whereas, in the
> more realistic case where updates are localized, we can hope to better
> amortize the cost of inserting index tuples, and recycling index
> tuples.

Quite possibly

>> What do you mean with "first class part"?
>>
>> It's not included in the scankey for a variety of reasons, not the
>> least of which is not breaking the interface for current uses, and
>> because the tid type is quite limited in its capabilities ATM. Also,
>> the heap TID is usually there on most AM calls that care about it (ie:
>> insertions have it, of course), so adding it to the scankey also
>> seemed redundant.
>
> I mean that insertions into indexes that are low cardinality should be
> largely guided by TID comparisons.

...

>> If not adding to the scankey, what do you mean by "first class" then?
>
> Just what I said about the sort order of an index on "(foo)" now
> precisely matching what we'd get for "(foo, ctid)".

It is the case in both versions of the patch

> There are a couple
> of tricky issues with that that you'd have to look out for, like
> making sure that the high key continues to hold a real TID, which at a
> glance looks like something that just happens already. I'm not sure
> that we preserve the item pointer TID in all cases, since the current
> assumption is that a high key's TID is just filler -- maybe we can
> lose that at some point.

I thought long about that, and inner pages don't need a real TID in
their keys, as those keys only specify key space cutoff points and not
real tids, and high tids in leaf pages are always real.

I haven't had any issue with that, aside from the headaches resulting
from thinking about that for protracted periods of time.

> You should use amcheck to specifically verify that that happens
> reliably in all cases. Presumably, its use of an insertion scankey
> would automatically see the use of TID as a tie-breaker with patched
> Postgres amcheck verification, and so amcheck will work for this
> purpose unmodified.

It's totally on my roadmap. I'll have to adapt it for the new index
format though, it doesn't work without modification.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: autovacuum: Drop orphan temp tables more quickly but with more c

2016-11-21 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Nov 21, 2016 at 10:05 AM, Robert Haas  wrote:
>> autovacuum: Drop orphan temp tables more quickly but with more caution.

> I have found an obvious bug when reading the code this morning:
> orphan_failures is not initialized:

My compiler noticed that, too.  Will push.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: autovacuum: Drop orphan temp tables more quickly but with more c

2016-11-21 Thread Michael Paquier
On Mon, Nov 21, 2016 at 10:05 AM, Robert Haas  wrote:
> autovacuum: Drop orphan temp tables more quickly but with more caution.
>
> Previously, we only dropped an orphan temp table when it became old
> enough to threaten wraparound; instead, doing it immediately.  The
> only value of waiting is that someone might be able to examine the
> contents of the orphan temp table for forensic purposes, but it's
> pretty difficult to actually do that and few users will wish to do so.
> On the flip side, not performing the drop immediately generates log
> spam and bloats pg_class.

I have found an obvious bug when reading the code this morning:
orphan_failures is not initialized:
diff --git a/src/backend/postmaster/autovacuum.c
b/src/backend/postmaster/autovacuum.c
index 954c1a1..be357e7 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1908,7 +1908,7 @@ do_autovacuum(void)
BufferAccessStrategy bstrategy;
ScanKeyData key;
TupleDesc   pg_class_desc;
-   int orphan_failures;
+   int orphan_failures = 0;
int effective_multixact_freeze_max_age;

/*
Attached is a patch.
-- 
Michael
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 954c1a1..be357e7 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1908,7 +1908,7 @@ do_autovacuum(void)
 	BufferAccessStrategy bstrategy;
 	ScanKeyData key;
 	TupleDesc	pg_class_desc;
-	int			orphan_failures;
+	int			orphan_failures = 0;
 	int			effective_multixact_freeze_max_age;
 
 	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve OOM handling in pg_locale.c

2016-11-21 Thread Tom Lane
Mithun Cy  writes:
> On Thu, Oct 13, 2016 at 1:40 PM, Michael Paquier 
> wrote:
>> I am attaching that to the next CF.

> One thing which you might need to reconsider is removal of memory leak
> comments. There is still a leak if there is an error while encoding in
> db_encoding_strdup.
> Unless you want to catch those error with an TRY();CATCH(); and then
> free the mem.

I could have lived with leaving the leak there, but really this wasn't
fixing the worst problem with the code: if it did throw an error out of
the middle of that sequence, it would leave the process setlocale'd to
some other locale than we want.  That could lead to unwanted behavior
in printf and other functions.  And this isn't all that unlikely: an
encoding conversion failure is definitely possible if you have a locale
selected that's not compatible with the database encoding.

I whacked the patch around enough so that we didn't do anything except
libc calls between setting and restoring the locale.  At that point it
was just a matter of adding a TRY block to be able to say that we
didn't leak any strdup'd strings, so I figured "might as well".

Pushed with those changes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve OOM handling in pg_locale.c

2016-11-21 Thread Michael Paquier
On Tue, Nov 22, 2016 at 8:28 AM, Tom Lane  wrote:
> I could have lived with leaving the leak there, but really this wasn't
> fixing the worst problem with the code: if it did throw an error out of
> the middle of that sequence, it would leave the process setlocale'd to
> some other locale than we want.  That could lead to unwanted behavior
> in printf and other functions.  And this isn't all that unlikely: an
> encoding conversion failure is definitely possible if you have a locale
> selected that's not compatible with the database encoding.
>
> I whacked the patch around enough so that we didn't do anything except
> libc calls between setting and restoring the locale.  At that point it
> was just a matter of adding a TRY block to be able to say that we
> didn't leak any strdup'd strings, so I figured "might as well".
>
> Pushed with those changes.

Thanks. The changes you have done look good to me at short sight.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Tom Lane
Robert Haas  writes:
> Don't sweat it!  Your sqlsmith fuzz testing has been extremely valuable.

That might be the understatement of the year.  I don't know how long
it would have taken us to find these things in the field.  Thanks!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Robert Haas
On Mon, Nov 21, 2016 at 5:14 PM, Andreas Seltenreich  wrote:
> Ashutosh Sharma writes:
>>> the following query appears to reliably crash parallel workers on master
>>> as of 0832f2d.
>
>> As suggested, I have tried to reproduce this issue on *0832f2d* commit but
>> could not reproduce it.
>
> as Tom pointed out earlier, I had secretly set parallel_setup_cost and
> parallel_tuple_cost to 0.  I assumed these were irrelevant when
> force_parallel_mode is on.  I'll do less assuming and more testing on a
> vanilla install on future reports.
>
> Sorry for the inconvenience,

Don't sweat it!  Your sqlsmith fuzz testing has been extremely valuable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Andreas Seltenreich
Hi,

Ashutosh Sharma writes:

>> the following query appears to reliably crash parallel workers on master
>> as of 0832f2d.

> As suggested, I have tried to reproduce this issue on *0832f2d* commit but
> could not reproduce it.

as Tom pointed out earlier, I had secretly set parallel_setup_cost and
parallel_tuple_cost to 0.  I assumed these were irrelevant when
force_parallel_mode is on.  I'll do less assuming and more testing on a
vanilla install on future reports.

Sorry for the inconvenience,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: multivariate statistics / proof of concept

2016-11-21 Thread Robert Haas
[ reviving an old multivariate statistics thread ]

On Thu, Nov 13, 2014 at 6:31 AM, Simon Riggs  wrote:
> On 12 October 2014 23:00, Tomas Vondra  wrote:
>
>> It however seems to be working sufficiently well at this point, enough
>> to get some useful feedback. So here we go.
>
> This looks interesting and useful.
>
> What I'd like to check before a detailed review is that this has
> sufficient applicability to be useful.
>
> My understanding is that Q9 and Q18 of TPC-H have poor plans as a
> result of multi-column stats errors.
>
> Could you look at those queries and confirm that this patch can
> produce better plans for them?

Tomas, did you ever do any testing in this area?  One of my
colleagues, Rafia Sabih, recently did some testing of TPC-H queries @
20 GB.  Q18 actually doesn't complete at all right now because of an
issue with the new simplehash implementation.  I reported it to Andres
and he tracked it down, but hasn't posted the patch yet - see
http://archives.postgresql.org/message-id/20161115192802.jfbec5s6ougxw...@alap3.anarazel.de

Of the remaining queries, the slowest are Q9 and Q20, and both of them
have serious estimation errors.  On Q9, things go wrong here:

 ->  Merge Join
(cost=5225092.04..6595105.57 rows=154 width=47) (actual
time=103592.821..149335.010 rows=6503988 loops=1)
   Merge Cond:
(partsupp.ps_partkey = lineitem.l_partkey)
   Join Filter:
(lineitem.l_suppkey = partsupp.ps_suppkey)
   Rows Removed by Join Filter: 19511964
   ->  Index Scan using
idx_partsupp_partkey on partsupp  (cost=0.43..781956.32 rows=15999792
width=22) (actual time=0.044..11825.481 rows=15999881 loops=1)
   ->  Sort
(cost=5224967.03..5245348.02 rows=8152396 width=45) (actual
time=103592.505..112205.444 rows=26015949 loops=1)
 Sort Key: part.p_partkey
 Sort Method: quicksort
Memory: 704733kB
 ->  Hash Join
(cost=127278.36..4289121.18 rows=8152396 width=45) (actual
time=1084.370..94732.951 rows=6503988 loops=1)
   Hash Cond:
(lineitem.l_partkey = part.p_partkey)
   ->  Seq Scan on
lineitem  (cost=0.00..3630339.08 rows=119994608 width=41) (actual
time=0.015..33355.637 rows=119994608 loops=1)
   ->  Hash
(cost=123743.07..123743.07 rows=282823 width=4) (actual
time=1083.686..1083.686 rows=216867 loops=1)
 Buckets:
524288  Batches: 1  Memory Usage: 11721kB
 ->  Gather
(cost=1000.00..123743.07 rows=282823 width=4) (actual
time=0.418..926.283 rows=216867 loops=1)
   Workers
Planned: 4
   Workers
Launched: 4
   ->
Parallel Seq Scan on part  (cost=0.00..94460.77 rows=70706 width=4)
(actual time=0.063..962.909 rows=43373 loops=5)

Filter: ((p_name)::text ~~ '%grey%'::text)

Rows Removed by Filter: 756627

The estimate for the index scan on partsupp is essentially perfect,
and the lineitem-part join is off by about 3x.  However, the merge
join is off by about 4000x, which is real bad.

On Q20, things go wrong here:

 ->  Merge Join  (cost=5928271.92..6411281.44
rows=278 width=16) (actual time=77887.963..136614.284 rows=118124
loops=1)
   Merge Cond: ((lineitem.l_partkey =
partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
   Join Filter:
((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity
   Rows Removed by Join Filter: 242
   ->  GroupAggregate
(cost=5363980.40..5691151.45 rows=9681876 width=48) (actual
time=76672.726..131482.677 rows=10890067 loops=1)
 Group Key: lineitem.l_partkey,
lineitem.l_suppkey
 ->  Sort
(cost=5363980.40..5409466.13 rows=18194291 width=21) (actual
time=76672.661..86405.882 rows=18194084 loops=1)
   Sort Key: lineitem.l_partkey,
lineitem.l_suppkey
   Sort Method: external merge
Disk: 551376kB
   ->  Bitmap Heap Scan on
lineitem  (cost=466716.05..3170023.42 rows=18194291 width=21) (actual
time=13735.552..39289.995 rows=18195269 loops=1)
 Recheck Cond:
((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01
00:00:00'::timestamp without time zone))
  

Re: [HACKERS] postgres 9.3 postgres_fdw ::LOG: could not receive data from client: Connection reset by peer

2016-11-21 Thread Kevin Grittner
On Mon, Nov 21, 2016 at 8:32 AM, Vladimir Svedov  wrote:

> I have this question. Looked for a help on http://dba.stackexchange.com/
> No success.

A link to the actual question would be appreciated.

> "FOREIGN_TABLE" created with postgres_fdw. LOCAL_TABLE is just a local 
> table...
>
> Symptoms:
>
> I run in psql query SELECT * from FOREIGN_TABLE. No log generated
> I run in bash psql -c "SELECT * from LOCAL_TABLE". No log generated
> I run in bash psql -c "SELECT * from FOREIGN_TABLE". ::LOG: could not receive 
> data from client: Connection reset by peer generated in postgres log

Please provide more information, and preferably a self-contained
test case (one that anyone can run on an empty database to see the
problem).

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-11-21 Thread Michael Paquier
On Tue, Nov 22, 2016 at 3:06 AM, Robert Haas  wrote:
> I don't think that you should skip it in the wraparound case, because
> it might be one of the temp tables that is threatening wraparound.
>
> I've committed the patch after doing some work on the comments.

OK, thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dblink get_connect_string() passes FDW option "updatable" to the connect string, connection fails.

2016-11-21 Thread Corey Huinker
>
> The dblink docs recommend using dblink_fdw as the FDW for this purpose,
> which would only accept legal connstr options.  However, I can see the
> point of using a postgres_fdw server instead, and considering that
> dblink isn't actually enforcing use of any particular FDW type, it seems
> like the onus should be on it to be more wary of what the options are.
>

I have to admit, this was the first I'd heard of dblink_fdw. I'm glad it
exists, though. And yes, I'd like to be able to use postgres_fdw entries
because there's value in knowing that the connection for your ad-hoc SQL
exactly matches the connection used for other FDW tables.


> It looks like this might be fairly easy to fix by having
> get_connect_string() use is_valid_dblink_option() to check each
> option name, and silently ignore options that are inappropriate.
>

>From what I can tell, it is very straightforward, the context oids are set
up just a few lines above where the new is_valid_dblink_option() calls
would be.

I'm happy to write the patch, for both v10 and any back-patches we feel are
necessary. However, I suspect with a patch this trivial that reviewing
another person's patch might be more work for a committer than just doing
it themselves. If that's not the case, let me know and I'll get started.


Re: [HACKERS] condition variables

2016-11-21 Thread Robert Haas
On Thu, Aug 11, 2016 at 5:47 PM, Robert Haas  wrote:
> So, in my
> implementation, a condition variable wait loop looks like this:
>
> for (;;)
> {
> ConditionVariablePrepareToSleep(cv);
> if (condition for which we are waiting is satisfied)
> break;
> ConditionVariableSleep();
> }
> ConditionVariableCancelSleep();

I have what I think is a better idea.  Let's get rid of
ConditionVariablePrepareToSleep(cv) and instead tell users of this
facility to write the loop this way:

for (;;)
{
if (condition for which we are waiting is satisfied)
break;
ConditionVariableSleep(cv);
}
ConditionVariableCancelSleep();

ConditionVariableSleep(cv) will check whether the current process is
already on the condition variable's waitlist.  If so, it will sleep;
if not, it will add the process and return without sleeping.

It may seem odd that ConditionVariableSleep(cv) doesn't necessary
sleep, but this design has a significant advantage: we avoid
manipulating the wait-list altogether in the case where the condition
is already satisfied when we enter the loop.  That's more like what we
already do in lwlock.c: we try to grab the lock first; if we can't, we
add ourselves to the wait-list and retry; if we then get the lock
after all we have to recheck whether we can get the lock and remove
ourselves from the wait-list if so.  Of course, there is some cost: if
we do have to wait, we'll end up checking the condition twice before
actually going to sleep.  However, it's probably smart to bet that
actually needing to sleep is fairly infrequent, just as in lwlock.c.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2016-11-21 Thread Peter Geoghegan
On Mon, Nov 21, 2016 at 9:42 AM, Claudio Freire  wrote:
>> I realized today, quite suddenly, why I disliked your original patch:
>> it didn't go far enough with embracing the idea of
>> heap-item-pointer-as-key. In other words, I didn't like that you
>> didn't change anything in the internal pages.
>
> But it did. In fact it only changed internal pages.

Oh, okay.

>> Maybe you should put
>> heap TIDs someplace in new internal pages, so that even there we treat
>> them as part of the key.
>
> That is indeed what's happening (both in the old and new version).

Good.

> The new version also opens up to the possibility of omitting the heap
> TID in inner pages, if they're redundant (ie: if two consecutive keys
> are different already, the heap TID part of the key is redundant,
> since it's not necessary to make tree descent decisions).

Makes sense; this is just another aspect of suffix truncation.

>> This may significantly benefit UPDATE-heavy
>> workloads where HOT doesn't work out so well. Particularly for
>> non-unique indexes, we currently have to wade through a lot of bloat
>> from the leaf level, rather than jumping straight to the correct leaf
>> page when updating or inserting.
>
> That is improved in the patch as well. The lookup for an insertion
> point for a heavily bloated (unique or not) index is done efficiently,
> instead of the O(N^2) way it used before.

Cool.

> It's done even for unique indexes. Locking in that case becomes
> complex, true, but I believe it's not an insurmountable problem.

I actually don't think that that would be all that complicated. It's
just one case where you have to mostly match the existing behavior.

>> Or, you might want to make
>> sure that B-Tree tuple insertions still find that "first page" to
>> check, despite still generally treating heap item pointer as part of
>> the key proper (in unique indexes, it can still behave like NULL, and
>> not participate in uniqueness checking, while still essentially being
>> part of the key/sort order).
>
> It behaves like that (even though it's not coded like that)

Why not? What do you mean?

What I'm interested in evaluating is an implementation where an index
on (foo) has a sort order/key space that is precisely the same as an
index on (foo, ctid), with zero exceptions.

>> It also occurs to me that our performance for updates in the event of
>> many NULL values in indexes is probably really bad, and could be
>> helped by this. You'll want to test all of this with a workload that
>> isn't very sympathetic to HOT, of course -- most of these benefits are
>> seen when HOT doesn't help.
>
> I haven't really tested with an overabundance of NULLs, I'll add that
> to the tests. I have tested low-cardinality indexes though, but only
> for correctness.

I think we'll have to model data distribution to evaluate the idea
well. After all, if there is a uniform distribution of values anyway,
you're going to end up with many dirty leaf pages. Whereas, in the
more realistic case where updates are localized, we can hope to better
amortize the cost of inserting index tuples, and recycling index
tuples.

> What do you mean with "first class part"?
>
> It's not included in the scankey for a variety of reasons, not the
> least of which is not breaking the interface for current uses, and
> because the tid type is quite limited in its capabilities ATM. Also,
> the heap TID is usually there on most AM calls that care about it (ie:
> insertions have it, of course), so adding it to the scankey also
> seemed redundant.

I mean that insertions into indexes that are low cardinality should be
largely guided by TID comparisons.

> If not adding to the scankey, what do you mean by "first class" then?

Just what I said about the sort order of an index on "(foo)" now
precisely matching what we'd get for "(foo, ctid)". There are a couple
of tricky issues with that that you'd have to look out for, like
making sure that the high key continues to hold a real TID, which at a
glance looks like something that just happens already. I'm not sure
that we preserve the item pointer TID in all cases, since the current
assumption is that a high key's TID is just filler -- maybe we can
lose that at some point.

You should use amcheck to specifically verify that that happens
reliably in all cases. Presumably, its use of an insertion scankey
would automatically see the use of TID as a tie-breaker with patched
Postgres amcheck verification, and so amcheck will work for this
purpose unmodified.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dblink get_connect_string() passes FDW option "updatable" to the connect string, connection fails.

2016-11-21 Thread Joe Conway
On 11/21/2016 02:16 PM, Tom Lane wrote:
> Corey Huinker  writes:
>> I ran into this today:
>> CREATE SERVER bob_srv FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host
>> 'localhost', dbname :'current_db' );
>> ...
>> ALTER SERVER bob_srv OPTIONS (updatable 'true');
>> SELECT *
>> FROM dblink('bob_srv','SELECT 1') as t(x integer);
>> psql:bug_example.sql:18: ERROR:  could not establish connection
>> DETAIL:  invalid connection option "updatable"
> 
>> Is this something we want to fix?
> 
> The dblink docs recommend using dblink_fdw as the FDW for this purpose,
> which would only accept legal connstr options.  However, I can see the
> point of using a postgres_fdw server instead, and considering that
> dblink isn't actually enforcing use of any particular FDW type, it seems
> like the onus should be on it to be more wary of what the options are.
> 
> It looks like this might be fairly easy to fix by having
> get_connect_string() use is_valid_dblink_option() to check each
> option name, and silently ignore options that are inappropriate.

Thanks for the report and analysis. I'll take a look at creating a patch
this week.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-21 Thread Tom Lane
Robert Haas  writes:
> 1. Should we try to avoid having the stats collector write a stats
> file during an immediate shutdown?  The file will be removed anyway
> during crash recovery, so writing it is pointless.

The point I was trying to make is that I think the forced-removal behavior
is not desirable, and therefore committing a patch that makes it be graven
in stone is not desirable either.

The larger picture here is that Takayuki-san wants us to commit a patch
based on a customer's objection to 9.2's behavior, without any real
evidence that the 9.4 change isn't a sufficient solution.  I've got
absolutely zero sympathy for that "the stats collector might be stuck in
an unkillable state" argument --- where's the evidence that the stats
collector is any more prone to that than any other postmaster child?
And for that matter, if we are stuck because of a nonresponding NFS
server, how is a quicker postmaster exit going to help anything?
You're not going to be able to start a new postmaster if the data
directory is on a nonresponsive server.

I'd be willing to entertain a proposal to make the 5-second limit
adjustable, but I don't think we need entirely new behavior here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres abort found in 9.3.11

2016-11-21 Thread Tom Lane
"K S, Sandhya (Nokia - IN/Bangalore)"  writes:
> As suggested by you, we upgraded the postgres to version 9.3.14. Also we 
> removed all the patches we had applied before. But the issue is still 
> observed in the latest version as well.

Can you make a test case for other people to try?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?

2016-11-21 Thread Robert Haas
On Sun, Nov 20, 2016 at 10:20 PM, Tsunakawa, Takayuki
 wrote:
> The reasons why I proposed this patch are:
>
> * It happened in a highly mission-critical production system of a customer 
> who uses 9.2.
>
> * 9.4's solution is not perfect, because it wastes 5 seconds anyway, which is 
> unexpected for users.  The customer's requirement includes failover within 30 
> seconds, so 5 seconds can be seen as a risk.
> Plus, I'm worried about the possibility that the SIGKILLed process wouldn't 
> disappear if it's writing to a network storage like NFS.
>
> * And first of all, the immediate shutdown should shut the server down 
> immediately without doing anything heavy, as the name means.

So there are two questions here:

1. Should we try to avoid having the stats collector write a stats
file during an immediate shutdown?  The file will be removed anyway
during crash recovery, so writing it is pointless.  I think you are
right that 9.4's solution here is not perfect, because of the 5 second
delay, and also because if the stats collector is stuck inside the
kernel trying to write to the OS, it may be in a non-interruptible
wait state where even SIGKILL has no immediate effect.  Anyway, it's
stupid even from a performance point of view to waste time writing a
file that we're just going to nuke.

2. Should we close listen sockets sooner during an immediate shutdown?
 I agree with Tom and Peter that this isn't a good idea.  People
expect the sockets not to go away until the end - e.g. they use
PQping() to test the server status, or they connect just to see what
error they get - and the fact that a client application could
hypothetically generate such a relentless stream of connection
attempts that the dead-end backends thereby created slow down shutdown
is not in my mind a sufficient reason to change the behavior.

So I think 001 should proceed and 002 should be rejected.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-11-21 Thread Tom Lane
Alvaro Herrera  writes:
> Something I just noticed is that transformTableExpr takes a TableExpr
> node and returns another TableExpr node.  That's unlike what we do in
> other places, where the node returned is of a different type than the
> input node.  I'm not real clear what happens if you try to re-transform
> a node that was already transformed, but it seems worth thinking about.

We're not 100% consistent on that --- there are cases such as RowExpr
and CaseExpr where the same struct type is used for pre-parse-analysis
and post-parse-analysis nodes.  I think it's okay as long as the
information content isn't markedly different, ie the transformation
just consists of transforming all the sub-nodes.

Being able to behave sanely on a re-transformation used to be an
issue, but we no longer expect transformExpr to support that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres abort found in 9.3.11

2016-11-21 Thread K S, Sandhya (Nokia - IN/Bangalore)
Hello,

As suggested by you, we upgraded the postgres to version 9.3.14. Also we 
removed all the patches we had applied before. But the issue is still observed 
in the latest version as well.

The issue is seen during normal run and only observed in the standby node. 

This time as well, the same error log is observed.
node-1 postgres[8743]: [18-1] PANIC:  btree_xlog_delete_get_latestRemovedXid: 
cannot operate with inconsistent data

Can you please share your inputs which would help us proceed further?

Regards,
Sandhya
 
-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Friday, September 16, 2016 1:29 AM
To: K S, Sandhya (Nokia - IN/Bangalore) 
Cc: pgsql-hackers@postgresql.org; Itnal, Prakash (Nokia - IN/Bangalore) 

Subject: Re: [HACKERS] Postgres abort found in 9.3.11

"K S, Sandhya (Nokia - IN/Bangalore)"  writes:
> We tried to replicate the scenario without our patch(exiting postmaster) and 
> still we were able to see the issue.

> Same error was seen this time as well.
> node-0 postgres[8243]: [1-2] HINT:  Is another postmaster already running on 
> port 5433? If not, wait a few seconds and retry.  
> node-1 postgres[8650]: [18-1] PANIC:  btree_xlog_delete_get_latestRemovedXid: 
> cannot operate with inconsistent data

> Crash was not seen in 9.3.9 without the patch but it was reproduced in 9.3.11.
> So something specifically changed between 9.3.9 and 9.3.11 is causing the 
> issue.

Well, I looked through the git history from 9.3.9 to 9.3.11 and I don't
see anything that seems likely to explain a problem here.

If you can reproduce this, which it sounds like you can, maybe you could
create a self-contained test case for other people to try?

Also worth noting is that the current 9.3.x release is 9.3.14.  You
might save yourself some time by updating and seeing if it still
reproduces in 9.3.14.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-11-21 Thread Alvaro Herrera
Something I just noticed is that transformTableExpr takes a TableExpr
node and returns another TableExpr node.  That's unlike what we do in
other places, where the node returned is of a different type than the
input node.  I'm not real clear what happens if you try to re-transform
a node that was already transformed, but it seems worth thinking about.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-11-21 Thread Tom Lane
[ apologies for not responding sooner ]

Etsuro Fujita  writes:
> On 2016/11/10 5:17, Tom Lane wrote:
>> I think there's a very good argument that we should just treat any inval
>> in pg_foreign_data_wrapper as a reason to blow away the whole plan cache.
>> People aren't going to alter FDW-level options often enough to make it
>> worth tracking things more finely.  Personally I'd put pg_foreign_server
>> changes in the same boat, though maybe those are changed slightly more
>> often.  If it's worth doing anything more than that, it would be to note
>> which plans mention foreign tables at all, and not invalidate those that
>> don't.  We could do that with, say, a hasForeignTables bool in
>> PlannedStmt.

> I agree on that point.

OK, please update the patch to handle those catalogs that way.

>> That leaves the question of whether it's worth detecting table-level
>> option changes this way, or whether we should just handle those by forcing
>> a relcache inval in ATExecGenericOptions, as in Amit's original patch in
>> <5702298d.4090...@lab.ntt.co.jp>.  I kind of like that approach; that
>> patch was short and sweet, and it put the cost on the unusual path (ALTER
>> TABLE) not the common path (every time we create a query plan).

> I think it'd be better to detect table-level option changes because that 
> could allow us to do more; we could avoid invalidating cached plans that 
> need not to be invalidated, by tracking the plan dependencies more 
> exactly, ie, avoid collecting the dependencies of foreign tables in a 
> plan that are in dead subqueries or excluded by constraint exclusion. 
> The proposed patch doesn't do that, though.  I think updates on 
> pg_foreign_table would be more frequent, so it might be worth 
> complicating the code.  But the relcache-inval approach couldn't do 
> that, IIUC.

Why not?  But the bigger picture here is that relcache inval is the
established practice for forcing replanning due to table-level changes,
and I have very little sympathy for inventing a new mechanism for that
just for foreign tables.  If it were worth bypassing replanning for
changes in tables in dead subqueries, for instance, it would surely be
worth making that happen for all table types not just foreign tables.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2016-11-21 Thread Andres Freund
Hi,

On 2016-11-21 16:17:58 +0800, Craig Ringer wrote:
> I've prepared a working initial, somewhat raw implementation for
> logical decoding on physical standbys.

Please attach. Otherwise in a year or two it'll be impossible to look
this up.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dblink get_connect_string() passes FDW option "updatable" to the connect string, connection fails.

2016-11-21 Thread Tom Lane
Corey Huinker  writes:
> I ran into this today:
> CREATE SERVER bob_srv FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host
> 'localhost', dbname :'current_db' );
> ...
> ALTER SERVER bob_srv OPTIONS (updatable 'true');
> SELECT *
> FROM dblink('bob_srv','SELECT 1') as t(x integer);
> psql:bug_example.sql:18: ERROR:  could not establish connection
> DETAIL:  invalid connection option "updatable"

> Is this something we want to fix?

The dblink docs recommend using dblink_fdw as the FDW for this purpose,
which would only accept legal connstr options.  However, I can see the
point of using a postgres_fdw server instead, and considering that
dblink isn't actually enforcing use of any particular FDW type, it seems
like the onus should be on it to be more wary of what the options are.

It looks like this might be fairly easy to fix by having
get_connect_string() use is_valid_dblink_option() to check each
option name, and silently ignore options that are inappropriate.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dblink get_connect_string() passes FDW option "updatable" to the connect string, connection fails.

2016-11-21 Thread Robert Haas
On Sun, Nov 20, 2016 at 7:37 PM, Corey Huinker  wrote:
> I ran into this today:
>
> select current_database() as current_db \gset
> CREATE EXTENSION postgres_fdw;
> CREATE EXTENSION
> CREATE EXTENSION dblink;
> CREATE EXTENSION
> CREATE ROLE bob LOGIN PASSWORD 'bob';
> CREATE ROLE
> CREATE SERVER bob_srv FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host
> 'localhost', dbname :'current_db' );
> CREATE SERVER
> CREATE USER MAPPING FOR public SERVER bob_srv OPTIONS ( user 'bob', password
> 'bob' );
> CREATE USER MAPPING
> SELECT *
> FROM dblink('bob_srv','SELECT 1') as t(x integer);
>  x
> ---
>  1
> (1 row)
>
> ALTER SERVER bob_srv OPTIONS (updatable 'true');
> ALTER SERVER
> SELECT *
> FROM dblink('bob_srv','SELECT 1') as t(x integer);
> psql:bug_example.sql:18: ERROR:  could not establish connection
> DETAIL:  invalid connection option "updatable"
>
>
> Is this something we want to fix?

Looks like a bug to me.

> If so, are there any other fdw/server/user-mapping options that we don't
> want to pass along to the connect string?

InitPgFdwOptions has an array labeled as /* non-libpq FDW-specific FDW
options */.  Presumably all of those should be handled in a common
way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2016-11-21 Thread Thomas Munro
On Tue, Nov 22, 2016 at 7:29 AM, Kevin Grittner  wrote:
> On Mon, Nov 21, 2016 at 11:29 AM, Alvaro Herrera
>  wrote:
>
>>> When I used the word "cache" here, I was thinking more of this
>>> English language definition:
>>>
>>>   a :  a hiding place especially for concealing and preserving
>>>provisions or implements
>>>   b :  a secure place of storage
>>>
>>> The intent being to emphasize that there is not one public
>>> "registry" of such objects, but context-specific collections where
>>> references are tucked away when they become available for later use
>>> in the only the appropriate context.
>
>> How about "stash"?  According to my reading of Merriam-Webster's
>> definition, "stash" mostly appears to be the thing that is stored
>> (hidden), rather than the place it's stored in, but one of the
>> definitions is "hiding place", and "cache" is listed as a synonym.
>
> "Stash" seems better that "cache" or "registry", especially since
> many programmers these days seem to associate "cache" with
> pass-through proxy techniques.  I first became familiar with the
> term "cache" while reading Jack London, and tend to retain some
> association with the more general definition.  Clearly I am in the
> minority on that here.

I was suggesting something like QueryEnvironment because it focuses on
its role, not that fact that there are things stored in it.  It's
conceptually like the environment in an interpreter, which is some
kind of namespace into which objects are bound by name.  My suggestion
"SpiRelation" now seems a bit short sighted in light of your comments,
so I retract that bit.  How about a QueryEnvironment (as shown in the
patch I posted) that contains a list of NamedTuplestore pointers (the
SpiRelation struct in the patch I posted, renamed) and in future
perhaps lists of other ephemeral/transient objects that we want to
expose to SQL?  We would possibly have more than one list because SQL
is not "Lisp-1" in nature: relations and functions and other kinds of
object exist in different namespaces, though there may need to be
polymorphism among kinds of named relations in the same list, so
perhaps NamedTuplestore should be a node with a tag.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-21 Thread Robert Haas
On Sun, Nov 20, 2016 at 8:48 PM, Tsunakawa, Takayuki
 wrote:
>> True, but raising the bar for this feature so that it doesn't get done is
>> also bad.  It can be improved in a later patch.
>
> I thought you are very strict about performance, so I hesitate to believe you 
> forgive the additional round trip.  libpq failover is a new feature in v10, 
> so I think it should provide the best user experience for v10 client+server 
> users from the start.  If the concern is the time for development, then 
> support for older releases can be added in a later patch.
>
> There are still several months left for v10.  Why don't we try the best?  
> Could you share the difficulty?

I am very strict about regressing the performance of things that we
already have, but I try not to make a policy that a new feature must
be as fast as it could ever be.  That could result in us having very
few new features.  Of course, it also matters how frequently the
overhead is likely to be incurred.  A little overhead on each tuple
visibility check is a much bigger problem than a little overhead on
each CREATE TABLE statement, which in turn is a much bigger problem
than a little overhead on each connection attempt.  For good
performance, connections must last a while, so a little extra time
setting one up doesn't seem like a showstopper to me.  Anyway, anybody
who finds this mechanism too expensive doesn't have to use it; they
can implement something else instead.

Also, I am not saying that we should not change this in time for v10.
I'm saying that I don't think it should be a requirement for the next
patch to be committed in this area to introduce a whole new mechanism
for determining whether something is a master or a standby.  Love it
or hate it, pgsql-jdbc has already implemented something in this area
and it does something useful -- without requiring a wire protocol
change.  Now you and Kevin are trying to say that what they did is all
wrong, but I don't agree.  There are very many users for whom the
pgsql-jdbc approach will do exactly what they need, and no doubt some
for whom it won't.  Getting a patch that mimics that approach
committed is *progress*.  Improving it afterwards, whether for v10 or
some later release, is also good.

There is a saying that one should not let the perfect be the enemy of
the good.  I believe that saying applies here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-21 Thread Karl O. Pinc
On Mon, 21 Nov 2016 13:17:17 -0500
Robert Haas  wrote:

> > I've a couple of other patches that do
> > a little cleanup on master that I'd also like to submit
> > along with your patch. 

> It would really be much better to submit anything that's not
> absolutely necessary for the new feature as a separate patch, rather
> than bundling things together.

My plan is separate patches, one email.

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-11-21 Thread Robert Haas
On Sat, Nov 19, 2016 at 2:54 PM, Tom Lane  wrote:
> Stephen Frost  writes:
>> Let's create a new data type for this which supports old and new types,
>> add what casts make sense, and call it a day.  Changing the data type
>> name out from under people is not helping anyone.
>
> +1.  I do not think changing behavior for the existing type name is
> going to be a net win.  If we'd been smart enough to make the type
> varlena from the get-go, maybe we could get away with it, but there
> is just way too much risk of trouble with a change in a fixed-length
> type's on-the-wire representation.

I completely agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2016-11-21 Thread Kevin Grittner
On Mon, Nov 21, 2016 at 11:29 AM, Alvaro Herrera
 wrote:

>> When I used the word "cache" here, I was thinking more of this
>> English language definition:
>>
>>   a :  a hiding place especially for concealing and preserving
>>provisions or implements
>>   b :  a secure place of storage
>>
>> The intent being to emphasize that there is not one public
>> "registry" of such objects, but context-specific collections where
>> references are tucked away when they become available for later use
>> in the only the appropriate context.

> How about "stash"?  According to my reading of Merriam-Webster's
> definition, "stash" mostly appears to be the thing that is stored
> (hidden), rather than the place it's stored in, but one of the
> definitions is "hiding place", and "cache" is listed as a synonym.

"Stash" seems better that "cache" or "registry", especially since
many programmers these days seem to associate "cache" with
pass-through proxy techniques.  I first became familiar with the
term "cache" while reading Jack London, and tend to retain some
association with the more general definition.  Clearly I am in the
minority on that here.

http://ereimer.net/20080706/13586_erC720.htm

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Tom Lane
Robert Haas  writes:
> On Mon, Nov 21, 2016 at 1:00 PM, Tom Lane  wrote:
>> Hah: not where I thought it was at all.  The problem seems to be down to
>> the optimization I put into is_parallel_safe() awhile back to skip testing
>> anything if we previously found the entire querytree to be parallel-safe.
>> Well, the raw query tree *is* parallel-safe.  It's only when we inject
>> some Params that we have a parallel hazard.  So that optimization is too
>> optimistic :-(

> That sucks.  Any idea how we might salvage it?

I just disabled it by checking to see if any Params have been created.
It might be possible to be more refined, but I dunno that adding more
bookkeeping would pay for itself.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-11-21 Thread Robert Haas
On Sat, Nov 19, 2016 at 8:51 AM, Karl O. Pinc  wrote:
> On Sat, 19 Nov 2016 12:58:47 +0100
> Gilles Darold  wrote:
>
>> All patches you've submitted on tha v13 patch have been applied and
>> are present in attached v14 of the patch. I have not included the
>> patches about GUC changes because I'm not sure that adding a new file
>> (include/utils/guc_values.h) just for that will be accepted or that it
>> will not require a more global work to add other GUC values. However
>> perhaps this patch can be submitted separately if the decision is not
>> taken here.
>
> Understood.  I've a couple of other patches that do
> a little cleanup on master that I'd also like to submit
> along with your patch.  This on the theory that
> the maintainers will be looking at this code anyway
> because your patch touches it.  All this can be submitted
> for their review at once.  My approach is to be minimally invasive on
> a per-patch basis (i.e. your patch) but add small patches
> that make existing code "better" without touching
> functionality.  (Deleting unnecessary statements, etc.)
> The overall goal being a better code base.

It would really be much better to submit anything that's not
absolutely necessary for the new feature as a separate patch, rather
than bundling things together.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Robert Haas
On Mon, Nov 21, 2016 at 1:00 PM, Tom Lane  wrote:
> I wrote:
>> Actually, the Gather path *isn't* parameterized.  The problem here is
>> that we're planning an unflattened subquery, and the only thing that
>> is parallel-unsafe is that there is an outer Param in its toplevel tlist,
>> and we're somehow deciding that we can stick that unsafe tlist into (and
>> beneath) the Gather node.  So something rotten in that area, but I've not
>> quite found it yet.
>
> Hah: not where I thought it was at all.  The problem seems to be down to
> the optimization I put into is_parallel_safe() awhile back to skip testing
> anything if we previously found the entire querytree to be parallel-safe.
> Well, the raw query tree *is* parallel-safe.  It's only when we inject
> some Params that we have a parallel hazard.  So that optimization is too
> optimistic :-(

That sucks.  Any idea how we might salvage it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-11-21 Thread Robert Haas
On Sun, Nov 20, 2016 at 10:42 PM, Michael Paquier
 wrote:
> On Sat, Nov 19, 2016 at 2:16 PM, Michael Paquier
>  wrote:
>> On Fri, Nov 18, 2016 at 1:11 PM, Robert Haas  wrote:
>>> That might sound adding unnecessary work just for the sake of
>>> paranoia, but I don't think it is.  Failures here won't be common, but
>>> since they are entirely automated there will be no human intelligence
>>> available to straighten things out.  Barring considerable caution,
>>> we'll just enter a flaming death spiral.
>>
>> Thinking more paranoid, an extra way to enter in this flaming death
>> spiral is to not limit the maximum number of failures authorized when
>> dropping a set of orphaned tables and transactions fail multiple
>> times. This is basically not important as the relation on which the
>> drop failed gets dropped from the list but failing on each one of them
>> is a good way to slow down autovacuum, so putting a limit of say 10
>> transactions failing is I think really important.
>
> By the way, when hacking this patch I asked myself three questions:
> 1) How many objects should be dropped per transaction? 50 sounds like
> a fine number to me so the patch I sent is doing so. This should
> definitely not be more than the default for max_locks_per_transaction,
> aka 64. Another thing to consider would be to use a number depending
> on max_locks_per_transaction, like half of it.
> 2) How many transaction failures should autovacuum forgive? The patch
> uses 10. Honestly I put that number out of thin air.
> 3) Should the drop of orphaned tables be done for a wraparound
> autovacuum? Obviously, the answer is no. It is vital not to consume
> transaction XIDs in this case. The patch I sent is dropping the
> objects even for a wraparound job, that's easily switchable.

I don't think that you should skip it in the wraparound case, because
it might be one of the temp tables that is threatening wraparound.

I've committed the patch after doing some work on the comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-11-21 Thread Claudio Freire
On Mon, Nov 21, 2016 at 2:15 PM, Masahiko Sawada  wrote:
> On Fri, Nov 18, 2016 at 6:54 AM, Claudio Freire  
> wrote:
>> On Thu, Nov 17, 2016 at 6:34 PM, Robert Haas  wrote:
>>> On Thu, Nov 17, 2016 at 1:42 PM, Claudio Freire  
>>> wrote:
 Attached is patch 0002 with pgindent applied over it

 I don't think there's any other formatting issue, but feel free to
 point a finger to it if I missed any
>>>
>>> Hmm, I had imagined making all of the segments the same size rather
>>> than having the size grow exponentially.  The whole point of this is
>>> to save memory, and even in the worst case you don't end up with that
>>> many segments as long as you pick a reasonable base size (e.g. 64MB).
>>
>> Wastage is bound by a fraction of the total required RAM, that is,
>> it's proportional to the amount of required RAM, not the amount
>> allocated. So it should still be fine, and the exponential strategy
>> should improve lookup performance considerably.
>
> I'm concerned that it could use repalloc for large memory area when
> vacrelstats->dead_tuples.dead_tuple is bloated. It would be overhead
> and slow.

How large?

That array cannot be very large. It contains pointers to
exponentially-growing arrays, but the repalloc'd array itself is
small: one struct per segment, each segment starts at 128MB and grows
exponentially.

In fact, IIRC, it can be proven that such a repalloc strategy has an
amortized cost of O(log log n) per tuple. If it repallocd the whole
tid array it would be O(log n), but since it handles only pointers to
segments of exponentially growing tuples it becomes O(log log n).

Furthermore, n there is still limited to MAX_INT, which means the cost
per tuple is bound by O(log log 2^32) = 5. And that's an absolute
worst case that's ignoring the 128MB constant factor which is indeed
relevant.

> What about using semi-fixed memroy space without repalloc;
> Allocate the array of ItemPointerData array, and each ItemPointerData
> array stores the dead tuple locations. The size of ItemPointerData
> array starts with small size (e.g. 16MB or 32MB). After we used an
> array up, we then allocate next segment with twice size as previous
> segment.

That's what the patch does.

> But to prevent over allocating memory, it would be better to
> set a limit of allocating size of ItemPointerData array to 512MB or
> 1GB.

There already is a limit to 1GB (the maximum amount palloc can allocate)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Tom Lane
I wrote:
> Actually, the Gather path *isn't* parameterized.  The problem here is
> that we're planning an unflattened subquery, and the only thing that
> is parallel-unsafe is that there is an outer Param in its toplevel tlist,
> and we're somehow deciding that we can stick that unsafe tlist into (and
> beneath) the Gather node.  So something rotten in that area, but I've not
> quite found it yet.

Hah: not where I thought it was at all.  The problem seems to be down to
the optimization I put into is_parallel_safe() awhile back to skip testing
anything if we previously found the entire querytree to be parallel-safe.
Well, the raw query tree *is* parallel-safe.  It's only when we inject
some Params that we have a parallel hazard.  So that optimization is too
optimistic :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG

2016-11-21 Thread Michael Meskes
> I wanted to say that in order to use the connection pointed 
> by the DECLARE STATEMENT some functions like ECPGdo() would be
> modified or
> new function would be added under the directory ecpglib/.
> 
> This modification or new function will be used to get the connection
> by statement_name.

Ah, now I understand. Thank you for your explanation.

I'm looking forward to seeing your patch.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2016-11-21 Thread Claudio Freire
On Fri, Nov 18, 2016 at 11:09 PM, Peter Geoghegan  wrote:
> On Fri, Nov 18, 2016 at 5:27 PM, Claudio Freire  
> wrote:
>> I've been changing the on-disk format considerably, to one that makes
>> more sense.
>
> I can see how that would be necessary. I'm going to suggest some more
> things that need a new btree version number now, too.
>
> I realized today, quite suddenly, why I disliked your original patch:
> it didn't go far enough with embracing the idea of
> heap-item-pointer-as-key. In other words, I didn't like that you
> didn't change anything in the internal pages.

But it did. In fact it only changed internal pages.

> Maybe you should put
> heap TIDs someplace in new internal pages, so that even there we treat
> them as part of the key.

That is indeed what's happening (both in the old and new version).

The new version also opens up to the possibility of omitting the heap
TID in inner pages, if they're redundant (ie: if two consecutive keys
are different already, the heap TID part of the key is redundant,
since it's not necessary to make tree descent decisions).

> This may significantly benefit UPDATE-heavy
> workloads where HOT doesn't work out so well. Particularly for
> non-unique indexes, we currently have to wade through a lot of bloat
> from the leaf level, rather than jumping straight to the correct leaf
> page when updating or inserting.

That is improved in the patch as well. The lookup for an insertion
point for a heavily bloated (unique or not) index is done efficiently,
instead of the O(N^2) way it used before.

> You might not want to do the same with unique indexes, where we must
> initially buffer lock "the first leaf page that a duplicate could be
> on" while we potentially look in one or more additional sibling pages
> (but bloat won't be so bad there, perhaps).

It's done even for unique indexes. Locking in that case becomes
complex, true, but I believe it's not an insurmountable problem.

> Or, you might want to make
> sure that B-Tree tuple insertions still find that "first page" to
> check, despite still generally treating heap item pointer as part of
> the key proper (in unique indexes, it can still behave like NULL, and
> not participate in uniqueness checking, while still essentially being
> part of the key/sort order).

It behaves like that (even though it's not coded like that)

> It also occurs to me that our performance for updates in the event of
> many NULL values in indexes is probably really bad, and could be
> helped by this. You'll want to test all of this with a workload that
> isn't very sympathetic to HOT, of course -- most of these benefits are
> seen when HOT doesn't help.

I haven't really tested with an overabundance of NULLs, I'll add that
to the tests. I have tested low-cardinality indexes though, but only
for correctness.

>> However, I haven't tested the current state of the patch thoroughly.
>>
>> If a WIP is fine, I can post the what I have, rebased.
>
> I'm mostly curious about the effects on bloat. I now feel like you
> haven't sufficiently examined the potential benefits there, since you
> never made heap item pointer a first class part of the key space.
> Maybe you'd like to look into that yourself first?

What do you mean with "first class part"?

It's not included in the scankey for a variety of reasons, not the
least of which is not breaking the interface for current uses, and
because the tid type is quite limited in its capabilities ATM. Also,
the heap TID is usually there on most AM calls that care about it (ie:
insertions have it, of course), so adding it to the scankey also
seemed redundant.

If not adding to the scankey, what do you mean by "first class" then?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Tom Lane
Robert Haas  writes:
> On Mon, Nov 21, 2016 at 12:00 PM, Tom Lane  wrote:
>> It seems like maybe searching for individual Params is the wrong thing.
>> Why are we allowing it to generate a parameterized Gather path at all?
>> Given the lack of any way to transmit runtime param values to the worker,
>> I can't see how that would ever work.

> Hmm, so you're thinking it could be the job of generate_gather_paths()
> to make sure we don't do that?

Actually, the Gather path *isn't* parameterized.  The problem here is
that we're planning an unflattened subquery, and the only thing that
is parallel-unsafe is that there is an outer Param in its toplevel tlist,
and we're somehow deciding that we can stick that unsafe tlist into (and
beneath) the Gather node.  So something rotten in that area, but I've not
quite found it yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2016-11-21 Thread Alvaro Herrera
Kevin Grittner wrote:

> On Mon, Nov 21, 2016 at 1:05 AM, Thomas Munro
>  wrote:

> > Also, Tsrcache is strangely named: it's not exactly a cache, it's
> > more of a registry.
> 
> When I used the word "cache" here, I was thinking more of this
> English language definition:
> 
>   a :  a hiding place especially for concealing and preserving
>provisions or implements
>   b :  a secure place of storage
> 
> The intent being to emphasize that there is not one public
> "registry" of such objects, but context-specific collections where
> references are tucked away when they become available for later use
> in the only the appropriate context.  Eventually, when these are
> used for some of the less "eager" timings of materialized view
> maintenance, they may be set aside for relatively extended periods
> (i.e., minutes or maybe even hours) before being used.  Neither
> "registry" nor "cache" seems quite right; maybe someone can think
> of a word with more accurate semantics.

How about "stash"?  According to my reading of Merriam-Webster's
definition, "stash" mostly appears to be the thing that is stored
(hidden), rather than the place it's stored in, but one of the
definitions is "hiding place", and "cache" is listed as a synonym.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2016-11-21 Thread Robert Haas
On Mon, Nov 21, 2016 at 12:04 PM, Kevin Grittner  wrote:
>> Also, Tsrcache is strangely named: it's not exactly a cache, it's
>> more of a registry.
>
> When I used the word "cache" here, I was thinking more of this
> English language definition:
>
>   a :  a hiding place especially for concealing and preserving
>provisions or implements
>   b :  a secure place of storage
>
> The intent being to emphasize that there is not one public
> "registry" of such objects, but context-specific collections where
> references are tucked away when they become available for later use
> in the only the appropriate context.  Eventually, when these are
> used for some of the less "eager" timings of materialized view
> maintenance, they may be set aside for relatively extended periods
> (i.e., minutes or maybe even hours) before being used.  Neither
> "registry" nor "cache" seems quite right; maybe someone can think
> of a word with more accurate semantics.

I complained about the use of "cache" in this name before, and I still
think that it is off-base.  I'm not saying there isn't some definition
of the word that could cover what you're doing here, but it's not the
definition that is going to pop to mind for people reading the code.
I think "registry" would be OK; the fact that there is a registry does
not mean it is a global registry; it can be a registry of ephemeral
relations specific to that query.  I'm sure there are other good
choices, too, but, please, not cache!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-11-21 Thread Masahiko Sawada
On Fri, Nov 18, 2016 at 6:54 AM, Claudio Freire  wrote:
> On Thu, Nov 17, 2016 at 6:34 PM, Robert Haas  wrote:
>> On Thu, Nov 17, 2016 at 1:42 PM, Claudio Freire  
>> wrote:
>>> Attached is patch 0002 with pgindent applied over it
>>>
>>> I don't think there's any other formatting issue, but feel free to
>>> point a finger to it if I missed any
>>
>> Hmm, I had imagined making all of the segments the same size rather
>> than having the size grow exponentially.  The whole point of this is
>> to save memory, and even in the worst case you don't end up with that
>> many segments as long as you pick a reasonable base size (e.g. 64MB).
>
> Wastage is bound by a fraction of the total required RAM, that is,
> it's proportional to the amount of required RAM, not the amount
> allocated. So it should still be fine, and the exponential strategy
> should improve lookup performance considerably.

I'm concerned that it could use repalloc for large memory area when
vacrelstats->dead_tuples.dead_tuple is bloated. It would be overhead
and slow. What about using semi-fixed memroy space without repalloc;
Allocate the array of ItemPointerData array, and each ItemPointerData
array stores the dead tuple locations. The size of ItemPointerData
array starts with small size (e.g. 16MB or 32MB). After we used an
array up, we then allocate next segment with twice size as previous
segment. But to prevent over allocating memory, it would be better to
set a limit of allocating size of ItemPointerData array to 512MB or
1GB. We could expand array of array using repalloc if needed, but it
would not be happend much. Other design is similar to current your
patch; the offset of the array of array and the offset of a
ItemPointerData array control current location, which are cleared
after finished reclaiming garbage on heap and index. And allocated
array is re-used by subsequent scanning heap.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Robert Haas
On Mon, Nov 21, 2016 at 12:00 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Nov 21, 2016 at 11:43 AM, Tom Lane  wrote:
>>> so what we've got is a case where a parameter computed by the FunctionScan
>>> (in the master) would need to be passed into the parallel workers at
>>> runtime.  Do we have code for that at all?  If so where is it?
>
>> No, that's not supposed to happen.
>
> OK, that makes this a planner failure: we should not have allowed this
> query to become parallelized.
>
>> Maybe it's checking the quals but not recursing into the tlist?
>
> It seems like maybe searching for individual Params is the wrong thing.
> Why are we allowing it to generate a parameterized Gather path at all?
> Given the lack of any way to transmit runtime param values to the worker,
> I can't see how that would ever work.

Hmm, so you're thinking it could be the job of generate_gather_paths()
to make sure we don't do that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2016-11-21 Thread Kevin Grittner
Thanks for the review!  Will respond further after reviewing your
suggested patches; this is a quick response just to the contents of
the email.

On Mon, Nov 21, 2016 at 1:05 AM, Thomas Munro
 wrote:

> Both ways have attracted criticism: the first involves touching
> basically every core function that might eventually parse, plan or
> execute a query to make it accept a Tsrcache and pass that on, and the
> second involves a bunch of Rube Goldberg machine-like
> callback/variable/parameter code.

Just to quantify "touching basically every core function that
might...", the number of functions which have a change in signature
(adding one parameter) is seven.  No more; no less.

> I spent some time investigating whether a third way would be viable:
> use ParamListInfo's setupParser hook and add an analogous one for the
> executor, so that there is no registry equivalent to Tsrcache, but
> also no dealing with param slots or (in plpgsql's case) new kinds of
> variables.  Instead, there would just be two callbacks: one for asking
> the tuplestore provider for metadata (statistics, TupleDesc) at
> planning time, and another for asking for the tuplestore by name at
> execution time.  One problem with this approach is that it requires
> using the SPI_*_with_paramlist interfaces, not the more commonly used
> arg-based versions, and requires using a ParamListInfo when you
> otherwise have no reason to because you have no parameters.  Also,
> dealing with callbacks to register your tuplestore supplier is a
> little clunky.  More seriously, requiring providers of those callbacks
> to write code that directly manipulates ParserState and EState and
> calls addRangeTableEntryXXX seems like a modularity violation --
> especially for PLs that are less integrated with core Postgres code
> than plpgsql.  I got this more or less working, but didn't like it
> much and didn't think it would pass muster.

ok

> After going through that experience, I now agree with Kevin: an
> interface where a new SPI interface lets PLs push a named tuplestore
> into the SPI connection to make it available to SQL seems like the
> simplest and tidiest way.  I do have some feedback and suggestions
> though:
>
> 1.  Naming:  Using tuplestores in AfterTriggersData make perfect sense
> to me but I don't think it follows that the exact thing we should
> expose to the executor to allow these to be scanned is a
> TuplestoreScan.  We have other nodes that essentially scan a
> tuplestore like WorkTableScan and Material but they have names that
> tell you what they are for.  This is for scanning data that has either
> been conjured up or passed on by SPI client code and exposed to SQL
> queries.  How about SpiRelationScan, SpiDataScan, SpiRelVarScan, ?

I think an SPI centered approach is the wrong way to go.  I feel
that an SPI *interface* will be very useful, and probably save
thousands of lines of fragile code which would otherwise be
blurring the lines of the layering, but I feel there there should
be a lower-level interface, and the SPI interface should use that
to provide a convenience layer.  In particular, I suspect that some
uses of these named tuplestore relations will initially use SPI for
convenience of development, but may later move some of that code to
dealing with parse trees, for performance.  Ideally, the execution
plan would be identical whether or not SPI was involved, so naming
implying the involvement of SPI would be misleading.
NamedTuplestoreScan might be an improvement over just
TuplestoreScan.

> Also, Tsrcache is strangely named: it's not exactly a cache, it's
> more of a registry.

When I used the word "cache" here, I was thinking more of this
English language definition:

  a :  a hiding place especially for concealing and preserving
   provisions or implements
  b :  a secure place of storage

The intent being to emphasize that there is not one public
"registry" of such objects, but context-specific collections where
references are tucked away when they become available for later use
in the only the appropriate context.  Eventually, when these are
used for some of the less "eager" timings of materialized view
maintenance, they may be set aside for relatively extended periods
(i.e., minutes or maybe even hours) before being used.  Neither
"registry" nor "cache" seems quite right; maybe someone can think
of a word with more accurate semantics.

> 2.  Scope of changes:  If we're going to touch functions all over the
> source tree to get the Tsrcache where it needs to be for parsing and
> execution, then I wonder if we should consider thinking a bit more
> about where this is going.  What if we had a thing called a
> QueryEnvironment, and we passed a pointer to that into to all those
> places, and it could contain the named tuplestore registry?

I agree.  I had a version building on the Tsrcache approach which
differentiated between three levels of generality: Ephemeral
Relations, a subclass of that call Named Ephemeral Re

Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Tom Lane
Robert Haas  writes:
> On Mon, Nov 21, 2016 at 11:43 AM, Tom Lane  wrote:
>> so what we've got is a case where a parameter computed by the FunctionScan
>> (in the master) would need to be passed into the parallel workers at
>> runtime.  Do we have code for that at all?  If so where is it?

> No, that's not supposed to happen.

OK, that makes this a planner failure: we should not have allowed this
query to become parallelized.

> Maybe it's checking the quals but not recursing into the tlist?

It seems like maybe searching for individual Params is the wrong thing.
Why are we allowing it to generate a parameterized Gather path at all?
Given the lack of any way to transmit runtime param values to the worker,
I can't see how that would ever work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Robert Haas
On Mon, Nov 21, 2016 at 11:43 AM, Tom Lane  wrote:
> I wrote:
>> Like Ashutosh, I can't reproduce the crash, so it's hard to speculate much
>> further.
>
> Ah-hah: now I can.  The recipe lacks these important steps:
>
> set parallel_setup_cost TO 0;
> set parallel_tuple_cost TO 0;
>
> That changes the plan to
>
>  Limit  (cost=0.00..0.06 rows=1 width=64)
>->  Nested Loop  (cost=0.00..57.25 rows=1000 width=64)
>  ->  Function Scan on pg_show_all_settings a  (cost=0.00..10.00 
> rows=1000 width=64)
>  ->  Limit  (cost=0.00..0.03 rows=1 width=32)
>->  Gather  (cost=0.00..3.54 rows=130 width=32)
>  Workers Planned: 2
>  ->  Parallel Seq Scan on pg_opclass  (cost=0.00..3.54 
> rows=54 width=32)
>
> so what we've got is a case where a parameter computed by the FunctionScan
> (in the master) would need to be passed into the parallel workers at
> runtime.  Do we have code for that at all?  If so where is it?

No, that's not supposed to happen.  That's why we have this:

/*
 * We can't pass Params to workers at the moment either, so they are also
 * parallel-restricted.
 */
else if (IsA(node, Param))
{
if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
return true;
}

Maybe it's checking the quals but not recursing into the tlist?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Tom Lane
I wrote:
> Like Ashutosh, I can't reproduce the crash, so it's hard to speculate much
> further.

Ah-hah: now I can.  The recipe lacks these important steps:

set parallel_setup_cost TO 0;
set parallel_tuple_cost TO 0;

That changes the plan to

 Limit  (cost=0.00..0.06 rows=1 width=64)
   ->  Nested Loop  (cost=0.00..57.25 rows=1000 width=64)
 ->  Function Scan on pg_show_all_settings a  (cost=0.00..10.00 
rows=1000 width=64)
 ->  Limit  (cost=0.00..0.03 rows=1 width=32)
   ->  Gather  (cost=0.00..3.54 rows=130 width=32)
 Workers Planned: 2
 ->  Parallel Seq Scan on pg_opclass  (cost=0.00..3.54 
rows=54 width=32)

so what we've got is a case where a parameter computed by the FunctionScan
(in the master) would need to be passed into the parallel workers at
runtime.  Do we have code for that at all?  If so where is it?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2016-11-21 Thread Daniel Verite
Stephen Frost wrote:

> Are you working to make those changes?  I'd rather we make the changes
> to this code once rather than push what you have now only to turn around
> and change it significantly again.

So I went through the psql commands which don't fail on parse errors
for booleans. I'd like to attract attention on \c, because I see
several options.

\c [-reuse-previous=BOOL] ...other args..

Current: if we can't parse BOOL, assume it's ON, and go on with reconnecting.

Option1: if we can't parse BOOL, stop here, don't reconnect, set
the command result as "failed", also ignoring the other arguments.

Option2: maybe we want to create a difference between interactive
and non-interactive use, as there's already one in handling
the failure to connect through \c.
The manpage says:
   If the connection attempt failed (wrong user name, access denied,
   etc.), the previous connection will only be kept if psql is in
   interactive mode. When executing a non-interactive script,
   processing will immediately stop with an error.

Proposal: if interactive, same as Option1.
If not interactive, -reuse-previous=bogus has the same outcome
as a failure to connect. Which I think means two subcases:
if it's through \i then kill the connection but don't quit psql
if it's through -f then quit psql.

Option3: leave this command unchanged, avoiding trouble.

\timing BOOL

Current: non-parseable BOOL interpreted as TRUE. Empty BOOL toggles the
state.

Proposal: on non-parseable BOOL, command failure and pset.timing is 
left unchanged.

\pset [x | expanded | vertical ] BOOL
\pset numericlocale BOOL
\pset [tuples_only | t] BOOL
\pset footer BOOL
\t BOOL (falls into pset_do("tuples_only", ...))
\pset pager BOOL

Current: non-parseable non-empty BOOL interpreted as TRUE. Empty BOOL
toggles the on/off state. In some cases, BOOL interpretation is attempted
only after specific built-in values have been ruled out first.

Proposal: non-parseable BOOL implies command failure and unchanged state.

About the empty string when a BOOL is expected. Only \c -reuse-previous
seems concerned:

#= \c -reuse-previous=
acts the same as
#= \c -reuse-previous=ON

Proposal: handle empty as when the value is bogus.

The other commands interpret this lack of value specifically to toggle
the state, so it's a non-issue for them.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Tom Lane
Robert Haas  writes:
> Then again, that might just be a coincidence. The other thing that's
> weird here is that the Datum being passed is apparently a NULL
> pointer, which neither MakeExpandedObjectReadOnly() nor
> MakeExpandedObjectReadOnlyInternal() are prepared to deal with.  I
> don't know whether it's wrong for a NULL pointer to occur here in the
> first place or whether it's wrong for those functions not to be able
> to deal with it if it does.

The former.  MakeExpandedObjectReadOnly does contain a null check,
so something has passed it a zero Datum along with a claim that the
Datum is not null.

Like Ashutosh, I can't reproduce the crash, so it's hard to speculate much
further.  I am wondering if 13671b4b2 somehow fixed it, though I don't see
how.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-11-21 Thread Tobias Bussmann
Am 18.11.2016 um 14:21 schrieb Albe Laurenz :
> But then the change to use CURSOR_OPT_PARALLEL_OK in tcop/postgres.c should
> be reverted as well, because it breaks things just as bad:
> 
>   /* creates a parallel-enabled plan */
>   PQprepare(conn, "pstmt", "SELECT id FROM l WHERE val = $1", 1, NULL);
>   /* blows up with "cannot insert tuples during a parallel operation" */
>   PQexec(conn, "CREATE TABLE bad(id) AS EXECUTE pstmt('abcd')");

Great example of mixing a v3 prepare with an simple query execute. I didn't 
think about that while even the docs state clearly: "Named prepared statements 
can also be created and accessed at the SQL command level, using PREPARE and 
EXECUTE." Sticking with either protocol version does not trigger the error. 


> I think we should either apply something like that patch or disable parallel
> execution for prepared statements altogether and document that.

So we likely need to backpatch something more then a doc-fix for 9.6. Given the 
patch proposals around, this would either disable a feature (in extended query 
protocol) or add a new one (in simple query protocol/SQL). Or would it be more 
appropriate to split the patch and use CURSOR_OPT_PARALLEL_OK in prepare.c on 
master only? I'm asking in case there is a necessity to prepare a proposal for 
an documentation patch targeting 9.6 specifically.

Best regards
Tobias




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] postgres 9.3 postgres_fdw ::LOG: could not receive data from client: Connection reset by peer

2016-11-21 Thread Vladimir Svedov
Hi,
I have this question. Looked for a help on http://dba.stackexchange.com/
No success.
Maybe you can answer?..
Thank you in advance


"FOREIGN_TABLE" created with postgres_fdw. LOCAL_TABLE is just a local
table...

Symptoms:

   1. I run in psql query SELECT * from FOREIGN_TABLE. No log generated
   2. I run in bash psql -c "SELECT * from LOCAL_TABLE". No log generated
   3. I run in bash psql -c "SELECT * from FOREIGN_TABLE". ::LOG: could not
   receive data from client: Connection reset by peer generated in postgres
   log

I can't set logging lower and yet this message distracts. Please share any
idea how to set up env to avoid generating this message?.. I feel I'm
missing something obvious, but can't see what.

PS. I tried running -f file instead of -c "SQL". Of course it did not
change a thing. And of course I tried putting \q to file with same result...


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-21 Thread Tom Lane
Amit Kapila  writes:
> On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila  wrote:
>> Here instead of checking whether top_plan has initPlan, it should
>> check whether initPlan is present anywhere in plan tree.  I think one
>> simple way could be to check *glob->subplans* instead of
>> top_plan->initPlan,

> Patch based on above suggestion is attached with this mail.

I think this is the right fix for the moment, because the proximate cause
of the crash is that ExecSerializePlan doesn't transmit any part of the
PlannedStmt.subplans list, which glob->subplans is the preimage of.

Now, maybe I'm missing something, but it seems to me that ordinary
subplans could be transmitted to workers just fine.  The problem with
transmitting initplans is that you'd lose the expectation of single
evaluation.  (Even there, it might be okay if they're far enough down
in the plan tree, but I haven't thought about it in detail.)  So I'd
rather see us fix ExecSerializePlan to transmit the subplans list
and have some more-restrictive test here.  This code would still be
wrong as it stands though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-21 Thread Amit Kapila
On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila  wrote:
> On Sun, Nov 20, 2016 at 10:43 PM, Andreas Seltenreich
>  wrote:
>> Hi,
>>
>> the query below triggers a parallel worker assertion for me when run on
>> the regression database of master as of 0832f2d.  The plan sports a
>> couple of InitPlan nodes below Gather.
>>
>
> I think the reason of this issue is that in some cases where subplan
> is at some node other than top_plan node, we allow them to be executed
> in the worker in force_parallel_mode.  It seems to me that the
> problematic code is below check in standard_planner()
>
> if (force_parallel_mode != FORCE_PARALLEL_OFF &&
> best_path->parallel_safe &&
> top_plan->initPlan == NIL)
>
> Here instead of checking whether top_plan has initPlan, it should
> check whether initPlan is present anywhere in plan tree.  I think one
> simple way could be to check *glob->subplans* instead of
> top_plan->initPlan,
>

Patch based on above suggestion is attached with this mail.

Thoughts?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


restrict_subplans_parallel_mode_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan

2016-11-21 Thread Robert Haas
On Sun, Nov 20, 2016 at 8:53 AM, Andreas Seltenreich  wrote:
> the following query appears to reliably crash parallel workers on master
> as of 0832f2d.
>
> --8<---cut here---start->8---
> set max_parallel_workers_per_gather to 2;
> set force_parallel_mode to 1;
>
> select subq.context from pg_settings,
> lateral (select context from pg_opclass limit 1) as subq
> limit 1;
> --8<---cut here---end--->8---
>
> Backtrace of a worker below.

Based on the backtrace, I wonder if this might be a regression
introduced by Tom's recent commit
9a00f03e479c2d4911eed5b4bd1994315d409938, "Improve speed of aggregates
that use array_append as transition function.", which adds additional
cases where expanded datums can be used.  In theory, a value passed
from the leader to the workers ought to have gone through
datumSerialize() which contains code to flatten expanded
representations, but it's possible that code is broken, or maybe the
problematic code path is something else altogether.  I'm just
suspicious about the fact that the failure is in
MakeExpandedObjectReadOnlyInternal().

Then again, that might just be a coincidence. The other thing that's
weird here is that the Datum being passed is apparently a NULL
pointer, which neither MakeExpandedObjectReadOnly() nor
MakeExpandedObjectReadOnlyInternal() are prepared to deal with.  I
don't know whether it's wrong for a NULL pointer to occur here in the
first place or whether it's wrong for those functions not to be able
to deal with it if it does.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Better handling of UPDATE multiple-assignment row expressions

2016-11-21 Thread Tom Lane
While fooling around trying to document the behavior of *-expansion,
I got annoyed about the fact that it doesn't work in the context of
UPDATE tab SET (c1,c2,c3) = (x,y,z) ...
The righthand side of this is a row-expression according to the SQL
standard, and "foo.*" would be expanded in any other row expression,
but we don't do that here.  Another oddity is that this ought to be
an abbreviated form of
UPDATE tab SET (c1,c2,c3) = ROW(x,y,z) ...
but we don't accept that.

It seemed like fixing this might be a good lazy-Sunday project ... so
here is a patch.

The reason this case doesn't work like other row-expressions is
that gram.y bursts the construct into
UPDATE tab SET c1 = x, c2 = y, c3 = z ...
before we ever do parse analysis.  gram.y has no idea what "foo.*"
might expand to, so there's no way that that approach can work if
we want to fix this; the conversion has to be postponed to parse
analysis.  The reason we didn't do it like that originally is
explained in gram.y:

 * Ideally, we'd accept any row-valued a_expr as RHS of a multiple_set_clause.
 * However, per SQL spec the row-constructor case must allow DEFAULT as a row
 * member, and it's pretty unclear how to do that (unless perhaps we allow
 * DEFAULT in any a_expr and let parse analysis sort it out later?).

But it turns out that the idea suggested in the parenthetical comment
is pretty easy to do.  The attached patch allows DEFAULT in any a_expr
and then makes transformExpr() throw an error for it if it wasn't handled
by earlier processing.  That allows improved error reporting: instead of
getting "syntax error" when DEFAULT appears someplace you can't put it,
you get a more specific error.  For instance, this:

regression=# UPDATE update_test SET (a,b) = (default, default+1);
ERROR:  syntax error at or near "+"
LINE 1: UPDATE update_test SET (a,b) = (default, default+1);
^

changes to:

regression=# UPDATE update_test SET (a,b) = (default, default+1);
ERROR:  DEFAULT is not allowed in this context
LINE 1: UPDATE update_test SET (a,b) = (default, default+1);
 ^

As stated in the comment I just quoted, ideally we'd allow any suitable
composite-valued a_expr on the RHS of this kind of SET clause.  The
attached patch doesn't really move the goal posts on that: you can still
only use a RowExpr or a sub-SELECT.  But the RowExpr is now processed in
standard fashion, and we've at least gotten that restriction out of
gram.y and pushed it down one more level.

Anyway, the user-visible effects of this are:

* You can now write ROW explicitly in this construct, which you should
be able to per SQL spec.

* Expansion of "foo.*" in the row-expression now works as expected
(see regression test changes).

* Some error conditions now give errors more specific than "syntax error",
specifically misplacement of DEFAULT and attempting to use an unsupported
kind of expression on the RHS of this kind of SET clause.

I couldn't find anything that seemed worth changing in the existing
docs for any of these points, although if we don't fix the second point,
it'd require adjustments to the new docs I proposed at
https://www.postgresql.org/message-id/16288.1479610...@sss.pgh.pa.us

Comments, better ideas?  Anyone want to do a formal review of this?

regards, tom lane

diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6901e08..36f8c54 100644
*** a/src/backend/parser/analyze.c
--- b/src/backend/parser/analyze.c
*** transformInsertStmt(ParseState *pstate, 
*** 644,651 
  		{
  			List	   *sublist = (List *) lfirst(lc);
  
! 			/* Do basic expression transformation (same as a ROW() expr) */
! 			sublist = transformExpressionList(pstate, sublist, EXPR_KIND_VALUES);
  
  			/*
  			 * All the sublists must be the same length, *after*
--- 644,655 
  		{
  			List	   *sublist = (List *) lfirst(lc);
  
! 			/*
! 			 * Do basic expression transformation (same as a ROW() expr, but
! 			 * allow SetToDefault at top level)
! 			 */
! 			sublist = transformExpressionList(pstate, sublist,
! 			  EXPR_KIND_VALUES, true);
  
  			/*
  			 * All the sublists must be the same length, *after*
*** transformInsertStmt(ParseState *pstate, 
*** 752,761 
  		Assert(list_length(valuesLists) == 1);
  		Assert(selectStmt->intoClause == NULL);
  
! 		/* Do basic expression transformation (same as a ROW() expr) */
  		exprList = transformExpressionList(pstate,
  		   (List *) linitial(valuesLists),
! 		   EXPR_KIND_VALUES);
  
  		/* Prepare row for assignment to target table */
  		exprList = transformInsertRow(pstate, exprList,
--- 756,769 
  		Assert(list_length(valuesLists) == 1);
  		Assert(selectStmt->intoClause == NULL);
  
! 		/*
! 		 * Do basic expression transformation (same as a ROW() expr, but allow
! 		 * SetToDefault at top level)
! 		 */
  		exp

Re: [HACKERS] condition variables

2016-11-21 Thread Robert Haas
On Mon, Nov 21, 2016 at 7:10 AM, Thomas Munro
 wrote:
>> I wonder if we shouldn't try to create the invariant that when the CV
>> mutex is not help, the state of cvSleeping has to be true if we're in
>> the proclist and false if we're not.  So ConditionVariableSignal()
>> would clear the flag before releasing the spinlock, for example.  Then
>> I think we wouldn't need proclist_contains().
>
> Yeah, that'd work.  ConditionVariableSleep would need to hold the
> spinlock while checking cvSleeping (otherwise there is race case where
> another process sets it to false but this process doesn't see that yet
> and then waits for a latch-set which never arrives).  It's not the end
> of the world but it seems unfortunate to have to acquire and release
> the spinlock in ConditionVariablePrepareToSleep and then immediately
> again in ConditionVariableSleep.
>
> I was going to code that up but I read this and had another idea:
>
> http://stackoverflow.com/questions/8594591/why-does-pthread-cond-wait-have-spurious-wakeups
>
> I realise that you didn't actually say you wanted to guarantee no
> spurious wakeups.  But since the client code already has to check its
> own exit condition, why bother adding a another layer of looping?
> Sprurious latch sets must be super rare, but even if they aren't, what
> do you save by filtering them here?  In this situation you already got
> woken from a deep slumbler and scheduled back onto the CPU, so it
> hardly matters whether you go around again in that loop or the
> client's loop.  We could make things really simple instead: get rid of
> cvSleeping, have ConditionVariablePrepareToSleep reset the latch, then
> have ConditionVariableSleep wait for the latch to be set just once (no
> loop).  Then we'd document that spurious wakeups are possible so the
> caller must write a robust predicate loop, exactly as you already
> showed in your first message.  We'd need to keep that
> proclist_contains stuff to avoid corrupting the list.
> proclist_contains would be the one source of truth for whether you're
> in the waitlist, and the client code's predicate loop would contain
> the one source of truth for whether the condition it is waiting for
> has been reached.

I don't think we can rely on spurious latch set events being rare.
There are an increasing number of things that set the process latch,
and there will very likely be more in the future.  For instance, the
arrival of tuples from a parallel worker associated with our session
will set the process latch.  Workers starting or dying will set the
process latch.  So my inclination was to try to guarantee no spurious
wakeups at all, but possibly a softer guarantee that makes them
unlikely would be sufficient.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-11-21 Thread Tobias Bussmann

> True, but we also try to avoid it whenever possible, because it's
> likely to lead to poor performance.

This non-readonly case should be way less often hit compared to other uses of 
prepared statements. But sure, it depends on the individual use case and a 
likely performance regession in these edge cases is nothing to decide for 
easily.


> I think it would be a good idea to come up with a way for a query to
> produce both a parallel and a non-parallel plan and pick between them
> at execution time.  However, that's more work than I've been willing
> to undertake.

Wouldn't the precautionary generation of two plans always increase the planning 
overhead, which precisely is what one want to reduce by using prepared 
statements? 

Best regards
Tobias

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2016-11-21 Thread Stephen Frost
Daniel,

* Daniel Verite (dan...@manitou-mail.org) wrote:
>   Stephen Frost wrote:
> 
> > Are you working to make those changes?  I'd rather we make the changes
> > to this code once rather than push what you have now only to turn around
> > and change it significantly again.
> 
> If, as a maintainer, you prefer this together in one patch, I'm fine
> with that. So I'll update it shortly with changes in \timing and
> a few other callers of ParseVariableBool().

Did you get a chance to review and consider the other comments from my
initial review about how we might use a different approach for bools, et
al?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Improvements in psql hooks for variables

2016-11-21 Thread Stephen Frost
Daniel,

* Daniel Verite (dan...@manitou-mail.org) wrote:
> "make check" seems OK with that, I hope it doesn't cause any regression
> elsewhere.

You can see what the code coverage of psql is in our current regression
tests by going here:

http://coverage.postgresql.org/src/bin/psql/index.html

It's not exactly a pretty sight and certainly not all callers of
ParseVariableBool() are covered.

I'd strongly suggest you either do sufficient manual testing, or add
regression tests, most likely using the tap test system (you can see an
example of that in src/bin/pg_dump/t and in other 't' directories).

You can generate that report after you make changes yourself using
'make coverage-html'.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Improvements in psql hooks for variables

2016-11-21 Thread Daniel Verite
Stephen Frost wrote:

> Are you working to make those changes?  I'd rather we make the changes
> to this code once rather than push what you have now only to turn around
> and change it significantly again.

If, as a maintainer, you prefer this together in one patch, I'm fine
with that. So I'll update it shortly with changes in \timing and
a few other callers of ParseVariableBool().

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improvements in psql hooks for variables

2016-11-21 Thread Daniel Verite
Stephen Frost wrote:

> Just fyi, there seems to be some issues with this patch because setting
> my PROMPT1 and PROMPT2 variables result in rather odd behavior.

Good catch! The issue is that the patch broke the assumption
of prompt hooks that their argument points to a long-lived buffer.
The attached v4 fixes the bug by passing to hooks a pointer to the final
strdup'ed value in VariableSpace rather than temp space, as commented
in SetVariable().

Also I've changed something else in ParseVariableBool(). The code assumes
"false" when value==NULL, but when value is an empty string, the result
was true and considered valid, due to the following test being
positive when len==0 (both with HEAD or the v3 patch from this thread):

if (pg_strncasecmp(value, "true", len) == 0)
return true;
It happens that "" as a value yields the same result than "true" for this
test so it passes, but it seems rather unintentional.

The resulting problem from the POV of the user is that we can do that,
for instance:

   test=> \set AUTOCOMMIT

without error message or feedback, and that leaves us without much
clue about autocommit being enabled:

   test=> \echo :AUTOCOMMIT

   test=> 

So I've changed ParseVariableBool() in v4 to reject empty string as an
invalid boolean (but not NULL since the startup logic requires NULL
meaning false in the early initialization of these variables).

"make check" seems OK with that, I hope it doesn't cause any regression
elsewhere.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index a9a2fdb..61b2304 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -254,7 +254,7 @@ exec_command(const char *cmd,
if (opt1 != NULL && strncmp(opt1, prefix, sizeof(prefix) - 1) 
== 0)
{
reuse_previous =
-   ParseVariableBool(opt1 + sizeof(prefix) - 1, 
prefix) ?
+   ParseVariableBool(opt1 + sizeof(prefix) - 1, 
prefix, NULL) ?
TRI_YES : TRI_NO;
 
free(opt1);
@@ -1548,7 +1548,7 @@ exec_command(const char *cmd,

 OT_NORMAL, NULL, false);
 
if (opt)
-   pset.timing = ParseVariableBool(opt, "\\timing");
+   pset.timing = ParseVariableBool(opt, "\\timing", NULL);
else
pset.timing = !pset.timing;
if (!pset.quiet)
@@ -2660,7 +2660,7 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
if (value && pg_strcasecmp(value, "auto") == 0)
popt->topt.expanded = 2;
else if (value)
-   popt->topt.expanded = ParseVariableBool(value, param);
+   popt->topt.expanded = ParseVariableBool(value, param, 
NULL);
else
popt->topt.expanded = !popt->topt.expanded;
}
@@ -2669,7 +2669,7 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
else if (strcmp(param, "numericlocale") == 0)
{
if (value)
-   popt->topt.numericLocale = ParseVariableBool(value, 
param);
+   popt->topt.numericLocale = ParseVariableBool(value, 
param, NULL);
else
popt->topt.numericLocale = !popt->topt.numericLocale;
}
@@ -2724,7 +2724,7 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
else if (strcmp(param, "t") == 0 || strcmp(param, "tuples_only") == 0)
{
if (value)
-   popt->topt.tuples_only = ParseVariableBool(value, 
param);
+   popt->topt.tuples_only = ParseVariableBool(value, 
param, NULL);
else
popt->topt.tuples_only = !popt->topt.tuples_only;
}
@@ -2756,7 +2756,7 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
popt->topt.pager = 2;
else if (value)
{
-   if (ParseVariableBool(value, param))
+   if (ParseVariableBool(value, param, NULL))
popt->topt.pager = 1;
else
popt->topt.pager = 0;
@@ -2778,7 +2778,7 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
else if (strcmp(param, "footer") == 0)
{
if (value)
-   popt->topt.default_footer = ParseVariableBool(value, 
param);
+   popt->topt.default_footer = ParseVariableBool(value, 
param, NULL);
  

Re: [HACKERS] Improvements in psql hooks for variables

2016-11-21 Thread Stephen Frost
Daniel,

* Daniel Verite (dan...@manitou-mail.org) wrote:
>   Tom Lane wrote:
> 
> > Stephen Frost  writes:
> > > In reviewing this patch, I also noticed that it's set up to assume a
> > > 'true' result when a variable can't be parsed by ParseVariableBool.
> > 
> > I suppose that's meant to be backwards-compatible with the current
> > behavior:
> > 
> > regression=# \timing foo
> > unrecognized value "foo" for "\timing"; assuming "on"
> > Timing is on.
> 
> Exactly. The scope of the patch is limited to the effect
> of \set assignments to built-in variables.
> 
> > but I agree that if we're changing things in this area, that would
> > be high on my list of things to change.  I think what we want going
> > forward is to disallow setting "special" variables to invalid values,
> > and that should hold both for regular variables that have special
> > behaviors, and for the special-syntax cases like \timing.
> 
> +1

Not sure I follow your reply here.  There seems to be broad agreement to
improve how we handle both \set and "special" variables and the code
paths are related and this patch is touching them, so it seems like the
correct next step here is to adjust the patch to update the code based
on that agreement.

Are you working to make those changes?  I'd rather we make the changes
to this code once rather than push what you have now only to turn around
and change it significantly again.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Improvements in psql hooks for variables

2016-11-21 Thread Daniel Verite
Tom Lane wrote:

> Stephen Frost  writes:
> > In reviewing this patch, I also noticed that it's set up to assume a
> > 'true' result when a variable can't be parsed by ParseVariableBool.
> 
> I suppose that's meant to be backwards-compatible with the current
> behavior:
> 
> regression=# \timing foo
> unrecognized value "foo" for "\timing"; assuming "on"
> Timing is on.

Exactly. The scope of the patch is limited to the effect
of \set assignments to built-in variables.

> but I agree that if we're changing things in this area, that would
> be high on my list of things to change.  I think what we want going
> forward is to disallow setting "special" variables to invalid values,
> and that should hold both for regular variables that have special
> behaviors, and for the special-syntax cases like \timing.

+1

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to change order sort of table in HashJoin

2016-11-21 Thread Man

Thanks for reply, sir.

On 11/21/2016 1:39 AM, Tom Lane wrote:

Man  writes:

Additional information.
In 9.6 the second table (lesser tuple) was choosen (the same testdata).
There are something (cost estimation?) different  in previous versions.

I'd bet on different statistics in the two installations (either you
forgot to ANALYZE, or the random sample came up quite a bit different).
And I'm a little suspicious that these tests weren't all done with the
same work_mem setting.


I dumped the two tables in pg9.4 and restored to pg9.6, sir.
I also set default_statistics_target to 1000 and ANALYZE d two tables in 
both installations.

And so that were result.

Anyway i know that order can not change by tuning parameters because it 
depend on storing data, thanks.



regards, tom lane


Thanks and best regards,


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-21 Thread Ashutosh Bapat
>
> Yeah, I modified the patch so, as I thought that would be consistent with
> the aggregate pushdown patch.

aggregate pushdown needs the tlist to judge the shippability of
targetlist. For joins that's not required, so we should defer, if we
can.

>
>>> Instead we should calculate tlist, the
>>> first time we need it and then add it to the fpinfo.
>
>
> Having said that, I agree on that point.  I'd like to propose (1) adding a
> new member to fpinfo, to store a list of output Vars from the subquery, and
> (2) creating a tlist from it in deparseRangeTblRef, then, which would allow
> us to calculate the tlist only when we need it.  The member added to fpinfo
> would be also useful to address the comments on the DML/UPDATE pushdown
> patch.  See the attached patch in [1].  I named the member a bit differently
> in that patch, though.

Again the list of Vars will be wasted if we don't choose that path for
final planning. So, I don't see the point of adding list of Vars
there. It looks like we are replacing one list with the other when
none of those are useful, if the path doesn't get chosen for the final
plan. If you think that the member is useful for DML/UDPATE pushdown,
you may want to add it in the other patch.

>
> You modified the comments I added to deparseLockingClause into this:
>
> /*
> +* Ignore relation if it appears in a lower subquery. Locking clause
> +* for such a relation is included in the subquery.
> +*/
>
> I don't think the second sentence is 100% correct because a locking clause
> isn't always required for such a relation, so I modified the sentence a bit.
>

I guess, "if required" part was implicit in construct "such a
relation". Your version seems to make it explicit. I am fine with it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Contains and is contained by operators of inet datatypes

2016-11-21 Thread Andreas Karlsson

On 11/17/2016 11:14 PM, Tom Lane wrote:

The original post proposed that we'd eventually get some benefit by
being able to repurpose << and >> to mean something else, but the
time scale over which that could happen is so long as to make it
unlikely to ever happen.  I think we'd need to deprecate these names
for several years, then actually remove them and have nothing there for
a few years more, before we could safely install new operators that
take the same arguments but do something different.  (For comparison's
sake, it took us five years to go from deprecating => as a user operator
to starting to use it as parameter naming syntax ... and that was a
case where conflicting use could be expected to throw an error, not
silently misbehave, so we could force it with little risk of silently
breaking peoples' applications.  To repurpose << and >> in this way
we would need to move much slower.)


I agree. The value in re-purposing them is pretty low given the long 
time scales needed before that can be done.



I'm inclined to think we should just reject this patch.  I'm certainly not
going to commit it without seeing positive votes from multiple people.


Given that I reviewed it I think you already have my vote on this.

I like the patch because it means less operators to remember for me as a 
PostgreSQL user. And at least for me inet is a rarely used type compared 
to hstore, json and range types which all use @> and <@.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows

2016-11-21 Thread Amit Kapila
On Mon, Nov 21, 2016 at 7:46 AM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila
>> > shared_buffers  tps
>> > 256MB  990
>> > 512MB  813
>> > 1GB  1189
>> > 2GB  2258
>> > 4GB  5003
>> > 8GB  5062
>> >
>> > "512MB is the largest effective size" seems to be a superstition, although
>> I don't know the reason for the drop at 512MB.
>> >

Okay, not an issue.  I think above data indicates that we can remove
512MB limit for Windows from docs.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan

2016-11-21 Thread Amit Kapila
On Sun, Nov 20, 2016 at 10:43 PM, Andreas Seltenreich
 wrote:
> Hi,
>
> the query below triggers a parallel worker assertion for me when run on
> the regression database of master as of 0832f2d.  The plan sports a
> couple of InitPlan nodes below Gather.
>

I think the reason of this issue is that in some cases where subplan
is at some node other than top_plan node, we allow them to be executed
in the worker in force_parallel_mode.  It seems to me that the
problematic code is below check in standard_planner()

if (force_parallel_mode != FORCE_PARALLEL_OFF &&
best_path->parallel_safe &&
top_plan->initPlan == NIL)

Here instead of checking whether top_plan has initPlan, it should
check whether initPlan is present anywhere in plan tree.  I think one
simple way could be to check *glob->subplans* instead of
top_plan->initPlan, another possibility is to traverse the whole tree
to see if initPlan is present.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2016-11-21 Thread Etsuro Fujita

On 2016/11/19 0:16, Ashutosh Bapat wrote:

Also another point

I guess, this note doesn't add much value in the given context.
Probably we should remove it.
+* Note: the tlist would have one-to-one correspondence with the
+* joining relation's reltarget->exprs because (1) the above test
+* on PHVs guarantees that the reltarget->exprs doesn't contain
+* any PHVs and (2) the joining relation's local_conds is NIL.
+* This allows us to search the targetlist entry matching a given
+* Var node from the tlist in get_subselect_alias_id.


OK, I removed.


On Fri, Nov 18, 2016 at 5:10 PM, Ashutosh Bapat
 wrote:


I wrote:

/*
 * For a join relation or an upper relation, use
deparseExplicitTargetList.
 * Likewise, for a base relation that is being deparsed as a subquery,
in
 * which case the caller would have passed tlist that is non-NIL, use
that
 * function.  Otherwise, use deparseTargetList.
 */



This looks correct. I have modified it to make it simple in the given
patch. But, I think we shouldn't refer to a function e.g.
deparseExplicitTargetlist() in the comment. Instead we should describe
the intent e.g. "deparse SELECT clause from the given targetlist" or
"deparse SELECT clause from attr_needed".


My taste would be to refer to those functions, because ISTM that makes 
the explanation more simple and direct.  So, I'd like to leave that for 
the committer's judge.


I wrote:

Done.  I modified the patch as proposed; create the tlist by
build_tlist_to_deparse in foreign_join_ok, if needed, and search the
tlist
by tlist_member.  I also added a new member "tlist" to PgFdwRelationInfo
to
save the tlist created in foreign_join_ok.


You wrote:

Instead of adding a new member, you might want to reuse grouped_tlist
by renaming it.



Done.



Right now, we are calculating tlist whether or not the ForeignPath
emerges as the cheapest path.


Yeah, I modified the patch so, as I thought that would be consistent 
with the aggregate pushdown patch.



Instead we should calculate tlist, the
first time we need it and then add it to the fpinfo.


Having said that, I agree on that point.  I'd like to propose (1) adding 
a new member to fpinfo, to store a list of output Vars from the 
subquery, and (2) creating a tlist from it in deparseRangeTblRef, then, 
which would allow us to calculate the tlist only when we need it.  The 
member added to fpinfo would be also useful to address the comments on 
the DML/UPDATE pushdown patch.  See the attached patch in [1].  I named 
the member a bit differently in that patch, though.


You modified the comments I added to deparseLockingClause into this:

/*
+* Ignore relation if it appears in a lower subquery. Locking clause
+* for such a relation is included in the subquery.
+*/

I don't think the second sentence is 100% correct because a locking 
clause isn't always required for such a relation, so I modified the 
sentence a bit.


Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/38245b84-fabf-0899-1b24-8f94cdc5900c%40lab.ntt.co.jp
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 109,114  typedef struct deparse_expr_cxt
--- 109,116 
  /* Handy macro to add relation name qualification */
  #define ADD_REL_QUALIFIER(buf, varno)	\
  		appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno))
+ #define SS_TAB_ALIAS_PREFIX	"s"
+ #define SS_COL_ALIAS_PREFIX	"c"
  
  /*
   * Functions to determine whether an expression can be evaluated safely on
***
*** 167,172  static void appendConditions(List *exprs, deparse_expr_cxt *context);
--- 169,182 
  static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
  	RelOptInfo *joinrel, bool use_alias, List **params_list);
  static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
+ static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
+    RelOptInfo *foreignrel, bool make_subquery,
+    List **params_list);
+ static void appendSubselectAlias(StringInfo buf, int tabno, int ncols);
+ static void get_subselect_alias_id(Var *node, RelOptInfo *foreignrel,
+ 	   int *tabno, int *colno);
+ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, int *tabno,
+ 		int *colno);
  static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
  static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
  static void appendAggOrderBy(List *orderList, List *targetList,
***
*** 990,1001  deparseSelectSql(List *tlist, List **retrieved_attrs, deparse_expr_cxt *context)
  	 */
  	appendStringInfoString(buf, "SELECT ");
  
  	if (foreignrel->reloptkind == RELOPT_JOINREL ||
! 		foreignrel->reloptkind == RELOPT_UPPER_REL)
! 	{
! 		/* For a join relation use the input tlist */
  		deparseExplicitTargetList(tlist, retrieved_attrs, context);
- 	}
  	else
  	{

  1   2   >