Re: [HACKERS] Patch: Implement failover on libpq connect level.
On 21 November 2016 at 00:08, Mithun Cy wrote: >> Please avoid adding another round trip by using a GUC_REPORTed variable >> (ParameterStatus entry). If you want to support this libpq failover with >> >pre-10 servers, you can switch the method of determining the primary based >> on the server version. But I don't think it's worth supporting older >> servers > at the price of libpq code complexity. > > Currently there is no consensus around this. For now, making this patch to > address failover to next primary as similar to JDBC seems sufficient for me. > On next proposal of patch I think we can try to extend as you have proposed FWIW, I agree. Working first, then improved. >> I haven't tracked the progress of logical replication, but will >> target_server_type be likely to be usable with it? How will >> target_server_type fit logical > replication? > > I tried to check logical replication WIP patch, not very sure how to > accomodate same. Logical replication downstreams won't force transactions to read-only. A significant part of the appeal of logical replication is that you can use temp tables and unlogged tables on the downstream, and even use persistent tables so long as they don't clash with the upstream. More sophisticated models even allow the downstream to safely write to replicated tables by doing conflict resolution. So as it stands, this patch would consider any logical replication downstream a 'master'. That's fine for now IMO. Determining whether a server is a logical replica isn't that simple, nor is it a clear-cut yes/no answer, and I think it's out of the scope of this patch to deal with it. Figure it out once logical replication lands. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Implement failover on libpq connect level.
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas > I am very strict about regressing the performance of things that we already > have, but I try not to make a policy that a new feature must be as fast > as it could ever be. That could result in us having very few new features. I see. I like your attitude toward new features. But I don't think now is the time to compromise this feature and rush to the commit. > Also, I am not saying that we should not change this in time for v10. > I'm saying that I don't think it should be a requirement for the next patch > to be committed in this area to introduce a whole new mechanism for > determining whether something is a master or a standby. Love it or hate > it, pgsql-jdbc has already implemented something in this area and it does > something useful -- without requiring a wire protocol change. Now you and > Kevin are trying to say that what they did is all wrong, but I don't agree. > There are very many users for whom the pgsql-jdbc approach will do exactly > what they need, and no doubt some for whom it won't. Getting a patch that > mimics that approach committed is *progress*. Improving it afterwards, > whether for v10 or some later release, is also good. transaction_read_only=on does not mean the standby. As the manual article on hot standby says, they are different. https://www.postgresql.org/docs/devel/static/hot-standby.html [Excerpt] -- In normal operation, “read-only” transactions are allowed to update sequences and to use LISTEN, UNLISTEN, and NOTIFY, so Hot Standby sessions operate under slightly tighter restrictions than ordinary read-only sessions. It is possible that some of these restrictions might be loosened in a future release. ... Users will be able to tell whether their session is read-only by issuing SHOW transaction_read_only. In addition, a set of functions (Table 9.79, “Recovery Information Functions”) allow users to access information about the standby server. These allow you to write programs that are aware of the current state of the database. These can be used to monitor the progress of recovery, or to allow you to write complex programs that restore the database to particular states. -- I'm afraid that if the current patch is committed, you will lose interest in the ideal solution. Then if the current patch is out as v10, there would be a concern about incompatibility when we pursue the ideal solution in a later release. That is, "should we continue to report that this server is standby even if it's actually a primary with transaction_read_only is on, to maintain compatibility with the older release." If you want to connect to a server where the transaction is read-only, then shouldn't the connection parameter be something like "target_session_attrs=readonly"? That represents exactly what the code does. > There is a saying that one should not let the perfect be the enemy of the > good. I believe that saying applies here. True, so I suggested not including the support for older servers for a while. Shall we find the real enemy -- what's preventing the ideal solution? I know my knowledge is still far less than you, so I may be missing something difficult. So, I'd like Mithun to share the difficulty. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Batch/pipelining support for libpq
On 22 November 2016 at 15:14, Haribabu Kommi wrote: > > On Fri, Nov 18, 2016 at 7:18 PM, Craig Ringer wrote: >> >> The latest is what's attached upthread and what's in the git repo also >> referenced upthread. >> >> I haven't been able to update in response to more recent review due to >> other development commitments. At this point I doubt I'll be able to >> get update it again in time for v10, so anyone who wants to adopt it >> is welcome. > > > Currently patch status is marked as "returned with feedback" in the 11-2016 > commitfest. Anyone who wants to work on it can submit the updated patch > by taking care of all review comments and change the status of the patch > at any time. > > Thanks for the patch. Thanks. Sorry I haven't had time to work on it. Priorities. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : For Auto-Prewarm.
Hi Ashutosh, This is a gentle reminder. you assigned as reviewer to the current patch in the 11-2016 commitfest. But you haven't shared your review yet. Please share your views about the patch. This will help us in smoother operation of commitfest. Please Ignore if you already shared your review. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] PATCH: Batch/pipelining support for libpq
On Fri, Nov 18, 2016 at 7:18 PM, Craig Ringer wrote: > The latest is what's attached upthread and what's in the git repo also > referenced upthread. > > I haven't been able to update in response to more recent review due to > other development commitments. At this point I doubt I'll be able to > get update it again in time for v10, so anyone who wants to adopt it > is welcome. > Currently patch status is marked as "returned with feedback" in the 11-2016 commitfest. Anyone who wants to work on it can submit the updated patch by taking care of all review comments and change the status of the patch at any time. Thanks for the patch. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] PoC: Partial sort
Hi Peter, This is a gentle reminder. you assigned as reviewer to the current patch in the 11-2016 commitfest. But you haven't shared your review yet in this commitfest on the latest patch posted by the author. If you don't have any comments on the patch, please move the patch into "ready for committer" state to get committer's attention. This will help us in smoother operation of commitfest. Please Ignore if you already shared your review. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/11/21 22:02, Ashutosh Bapat wrote: You wrote: Instead we should calculate tlist, the first time we need it and then add it to the fpinfo. I wrote: Having said that, I agree on that point. I'd like to propose (1) adding a new member to fpinfo, to store a list of output Vars from the subquery, and (2) creating a tlist from it in deparseRangeTblRef, then, which would allow us to calculate the tlist only when we need it. The member added to fpinfo would be also useful to address the comments on the DML/UPDATE pushdown patch. See the attached patch in [1]. I named the member a bit differently in that patch, though. Again the list of Vars will be wasted if we don't choose that path for final planning. So, I don't see the point of adding list of Vars there. If you think that the member is useful for DML/UDPATE pushdown, you may want to add it in the other patch. OK, I'd like to propose referencing to foreignrel->reltarget->exprs directly in deparseRangeTblRef and get_subselect_alias_id, then, which is the same as what I proposed in the first version of the patch, but I'd also like to change deparseRangeTblRef to use add_to_flat_tlist, not make_tlist_from_pathtarget, to create a tlist of the subquery, as you proposed. You modified the comments I added to deparseLockingClause into this: /* +* Ignore relation if it appears in a lower subquery. Locking clause +* for such a relation is included in the subquery. +*/ I don't think the second sentence is 100% correct because a locking clause isn't always required for such a relation, so I modified the sentence a bit. I guess, "if required" part was implicit in construct "such a relation". Your version seems to make it explicit. I am fine with it. OK, let's leave that for the committer's judge. Please find attached an updated version of the patch. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/deparse.c --- b/contrib/postgres_fdw/deparse.c *** *** 109,114 typedef struct deparse_expr_cxt --- 109,116 /* Handy macro to add relation name qualification */ #define ADD_REL_QUALIFIER(buf, varno) \ appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno)) + #define SS_TAB_ALIAS_PREFIX "s" + #define SS_COL_ALIAS_PREFIX "c" /* * Functions to determine whether an expression can be evaluated safely on *** *** 167,172 static void appendConditions(List *exprs, deparse_expr_cxt *context); --- 169,182 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *joinrel, bool use_alias, List **params_list); static void deparseFromExpr(List *quals, deparse_expr_cxt *context); + static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root, + RelOptInfo *foreignrel, bool make_subquery, + List **params_list); + static void appendSubselectAlias(StringInfo buf, int tabno, int ncols); + static void get_subselect_alias_id(Var *node, RelOptInfo *foreignrel, + int *tabno, int *colno); + static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, int *tabno, + int *colno); static void deparseAggref(Aggref *node, deparse_expr_cxt *context); static void appendGroupByClause(List *tlist, deparse_expr_cxt *context); static void appendAggOrderBy(List *orderList, List *targetList, *** *** 990,1001 deparseSelectSql(List *tlist, List **retrieved_attrs, deparse_expr_cxt *context) */ appendStringInfoString(buf, "SELECT "); if (foreignrel->reloptkind == RELOPT_JOINREL || ! foreignrel->reloptkind == RELOPT_UPPER_REL) ! { ! /* For a join relation use the input tlist */ deparseExplicitTargetList(tlist, retrieved_attrs, context); - } else { /* --- 1000,1015 */ appendStringInfoString(buf, "SELECT "); + /* + * For a join relation or an upper relation, use deparseExplicitTargetList. + * Likewise, for a base relation that is being deparsed as a subquery, in + * which case the caller would have passed non-NIL tlist, use that + * function. Otherwise, use deparseTargetList. + */ if (foreignrel->reloptkind == RELOPT_JOINREL || ! foreignrel->reloptkind == RELOPT_UPPER_REL || ! tlist != NIL) deparseExplicitTargetList(tlist, retrieved_attrs, context); else { /* *** *** 1155,1165 deparseLockingClause(deparse_expr_cxt *context) --- 1169,1187 StringInfo buf = context->buf; PlannerInfo *root = context->root; RelOptInfo *rel = context->scanrel; + PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) rel->fdw_private; int relid = -1; while ((relid = bms_next_member(rel->relids, relid)) >= 0) { /* + * Ignore relation if it appears in a lower subquery. Locking clause + * for such a relation, if needed, is included in the subquery. + */ + if (bms_is_member(relid, fpinfo->subquery_rels)) + continue; + + /* * Add FOR UPDATE/SHARE if appropriate. We apply locking
Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)
Hi Vinayak, This is a gentle reminder. you assigned as reviewer to the current patch in the 11-2016 commitfest. If you have any more review comments / performance results regarding the approach of the patch, please share it. Otherwise, you can mark the patch as "Ready for committer" for committer feedback. This will help us in smoother operation of the commitfest. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?
From: Tom Lane [mailto:t...@sss.pgh.pa.us] > The point I was trying to make is that I think the forced-removal behavior > is not desirable, and therefore committing a patch that makes it be graven > in stone is not desirable either. I totally agree that we should pursue the direction for escaping from the complete loss of stats files. Personally, I would like to combine that with the idea of persistent performance diagnosis information for long-term analysis (IIRC, someone proposed it.) However, I don't think my patch will make everyone forget about the problem of stats file loss during recovery. The problem exists with or without my patch, and my patch doesn't have the power to delute the importance of the problem. If you are worried about memory, we can add an entry for the problem in TODO list that Bruce-san is maintaining. Or, maybe we can just stop removing the stats files during recovery by keeping the files of previous generation and using it as the current one. I haven't seen how fresh the previous generation is (500ms ago?). A bit older might be better than nothing. > The larger picture here is that Takayuki-san wants us to commit a patch > based on a customer's objection to 9.2's behavior, without any real evidence > that the 9.4 change isn't a sufficient solution. I've got absolutely zero > sympathy for that "the stats collector might be stuck in an unkillable state" > argument --- where's the evidence that the stats collector is any more prone > to that than any other postmaster child? 9.4 change may be sufficient. But I don't think I can proudly explain the logic to a really severe customer. I can't answer the question "Why does PostgreSQL write files that will be deleted, even during 'immediate' shutdown? Why does PostgreSQL use 5 seconds for nothing?" Other children do nothing and exit immediately. I believe they are behaving correctly. > And for that matter, if we are stuck because of a nonresponding NFS server, > how is a quicker postmaster exit going to help anything? > You're not going to be able to start a new postmaster if the data directory > is on a nonresponsive server. NFS server can also be configured for HA, and the new postmaster can start as soon as the NFS server completes failover. > I'd be willing to entertain a proposal to make the 5-second limit adjustable, > but I don't think we need entirely new behavior here. Then, I'm at a loss what to do for the 9.2 user. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
On 22 November 2016 at 10:20, Craig Ringer wrote: > I'm currently looking at making detection of replay conflict with a > slot work by separating the current catalog_xmin into two effective > parts - the catalog_xmin currently needed by any known slots > (ProcArray->replication_slot_catalog_xmin, as now), and the oldest > actually valid catalog_xmin where we know we haven't removed anything > yet. OK, more detailed plan. The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid, are already held down by ProcArray's catalog_xmin. But that doesn't mean we haven't removed newer tuples from specific relations and logged that in xl_heap_clean, etc, including catalogs or user catalogs, it only means the clog still exists for those XIDs. We don't emit a WAL record when we advance oldestXid in SetTransactionIdLimit(), and doing so is useless because vacuum will have already removed needed tuples from needed catalogs before calling SetTransactionIdLimit() from vac_truncate_clog(). We know that if oldestXid is n, the true valid catalog_xmin where no needed tuples have been removed must be >= n. But we need to know the lower bound of valid catalog_xmin, which oldestXid doesn't give us. So right now a standby has no way to reliably know if the catalog_xmin requirement for a given replication slot can be satisfied. A standby can't tell based on a xl_heap_cleanup_info record, xl_heap_clean record, etc whether the affected table is a catalog or not, and shouldn't generate conflicts for non-catalogs since otherwise it'll be constantly clobbering walsenders. A 2-phase advance of the global catalog_xmin would mean that GetOldestXmin() would return a value from ShmemVariableCache, not the oldest catalog xmin from ProcArray like it does now. (auto)vacuum would then be responsible for: * Reading the oldest catalog_xmin from procarray * If it has advanced vs what's present in ShmemVariableCache, writing a new xlog record type recording an advance of oldest catalog xmin * advancing ShmemVariableCache's oldest catalog xmin and would do so before it called GetOldestXmin via vacuum_set_xid_limits() to determine what it can remove. GetOldestXmin would return the ProcArray's copy of the oldest catalog_xmin when in recovery, so we report it via hot_standby_fedback to the upstream, it's recorded on our physical slot, and in turn causes vacuum to advance the master's effective oldest catalog_xmin for vacuum. On the standby we'd replay the catalog_xmin advance record, advance the standby's ShmemVariableCache's oldest catalog xmin, and check to see if any replication slots, active or not, have a catalog_xmin < than the new threshold. If none do, there's no conflict and we're fine. If any do, we wait max_standby_streaming_delay/max_standby_archive_delay as appropriate, then generate recovery conflicts against all backends that have an active replication slot based on the replication slot state in shmem. Those backends - walsender or normal decoding backend - would promptly die. New decoding sessions will check the ShmemVariableCache and refuse to start if their catalog_xmin is < the threshold. Since we advance it before generating recovery conflicts there's no race with clients trying to reconnect after their backend is killed with a conflict. If we wanted to get fancy we could set the latches of walsender backends at risk of conflicting and they could check ShmemVariableCache's oldest valid catalog xmin, so they could send immediate keepalives with reply_requested set and hopefully get flush confirmation from the client and advance their catalog_xmin before we terminate them as conflicting with recovery. But that can IMO be done later/separately. Going to prototype this. Alternate approach: --- The oldest xid in heap_xlog_cleanup_info is relation-specific, but the standby has no way to know if it's a catalog relation or not during redo and know whether to kill slots and decoding sessions based on its latestRemovedXid. Same for xl_heap_clean and the other records that can cause snapshot conflicts (xl_xlog_visible, xl_heap_freeze_page, xl_btree_delete xl_btree_reuse_page, spgxlogVacuumRedirect). Instead of adding a 2-phase advance of the global catalog_xmin, we can instead add a rider to each of these records that identifies whether it's a catalog table or not. This would only be emitted when wal_level >= logical, but it *would* increase WAL sizes a bit when logical decoding is enabled even if it's not going to be used on a standby. The rider would be a simple: typedef struct xl_rel_catalog_info { bool rel_accessible_from_logical_decoding; } xl_catalog_info; or similar. During redo we call a new ResolveRecoveryConflictWithLogicalSlot function from each of those records' redo routines that does what I outlined above. This way add more info to more xlog records, and the upstream has to use RelationIsAccessibleInLogicalDecoding() to set up the records when writing the xlogs. In exchange, we don't h
Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
On 2016/11/22 4:49, Tom Lane wrote: Etsuro Fujita writes: On 2016/11/10 5:17, Tom Lane wrote: I think there's a very good argument that we should just treat any inval in pg_foreign_data_wrapper as a reason to blow away the whole plan cache. People aren't going to alter FDW-level options often enough to make it worth tracking things more finely. Personally I'd put pg_foreign_server changes in the same boat, though maybe those are changed slightly more often. If it's worth doing anything more than that, it would be to note which plans mention foreign tables at all, and not invalidate those that don't. We could do that with, say, a hasForeignTables bool in PlannedStmt. I agree on that point. OK, please update the patch to handle those catalogs that way. Will do. That leaves the question of whether it's worth detecting table-level option changes this way, or whether we should just handle those by forcing a relcache inval in ATExecGenericOptions, as in Amit's original patch in <5702298d.4090...@lab.ntt.co.jp>. I kind of like that approach; that patch was short and sweet, and it put the cost on the unusual path (ALTER TABLE) not the common path (every time we create a query plan). I think it'd be better to detect table-level option changes because that could allow us to do more; we could avoid invalidating cached plans that need not to be invalidated, by tracking the plan dependencies more exactly, ie, avoid collecting the dependencies of foreign tables in a plan that are in dead subqueries or excluded by constraint exclusion. The proposed patch doesn't do that, though. I think updates on pg_foreign_table would be more frequent, so it might be worth complicating the code. But the relcache-inval approach couldn't do that, IIUC. Why not? But the bigger picture here is that relcache inval is the established practice for forcing replanning due to table-level changes, and I have very little sympathy for inventing a new mechanism for that just for foreign tables. If it were worth bypassing replanning for changes in tables in dead subqueries, for instance, it would surely be worth making that happen for all table types not just foreign tables. My point here is that FDW-option changes wouldn't affect replanning by core; even if forcing a replan, we would have the same foreign tables excluded by constraint exclusion, for example. So, considering updates on pg_foreign_table to be rather frequent, it might be better to avoid replanning for the pg_foreign_table changes to foreign tables excluded by constraint exclusion, for example. My concern about the relcache-inval approach is: that approach wouldn't allow us to do that, IIUC. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pgpassfile connection option
> Independently of your patch, while testing I concluded that the multi-host feature and documentation should be improved: > - If a connection fails, it does not say for which host/port. Thanks I will re-test same. > In fact they are tried in turn if the network connection fails, but not > if the connection fails for some other reason such as the auth. > The documentation is not precise enough. If we fail due to authentication, then I think we should notify the client instead of trying next host. I think it is expected genuine user have right credentials for making the connection. So it's better if we notify same to client immediately rather than silently ignoring them. Otherwise the host node which the client failed to connect will be permanently unavailable until client corrects itself. But I agree documentation and error message as you said need improvements. I will try to correct same. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?
From: Robert Haas [mailto:robertmh...@gmail.com] > So there are two questions here: > > 1. Should we try to avoid having the stats collector write a stats file > during an immediate shutdown? The file will be removed anyway during crash > recovery, so writing it is pointless. I think you are right that 9.4's > solution here is not perfect, because of the 5 second delay, and also because > if the stats collector is stuck inside the kernel trying to write to the > OS, it may be in a non-interruptible wait state where even SIGKILL has no > immediate effect. Anyway, it's stupid even from a performance point of > view to waste time writing a file that we're just going to nuke. > > 2. Should we close listen sockets sooner during an immediate shutdown? > I agree with Tom and Peter that this isn't a good idea. People expect > the sockets not to go away until the end - e.g. they use > PQping() to test the server status, or they connect just to see what error > they get - and the fact that a client application could hypothetically > generate such a relentless stream of connection attempts that the dead-end > backends thereby created slow down shutdown is not in my mind a sufficient > reason to change the behavior. > > So I think 001 should proceed and 002 should be rejected. I'm happy with this conclusion, since I think 1 was the cause of slow shutdown, and 2 is just a hypothesis to pursue the completeness. And I can understand the concern about PQping(). Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: BUG #13755: pgwin32_is_service not checking if SECURITY_SERVICE_SID is disabled
From: Craig Ringer [mailto:cr...@2ndquadrant.com] > You meant CheckTokenMembership(). Yes, my typo in the mail. > The proposed patch does need to be checked with: I understood you meant by "refuse to run" that postgres.exe fails to start below. Yes, I checked it on Win10. I don't have access to WinXP/2003 - Microsoft ended their support. if (pgwin32_is_admin()) { write_stderr("Execution of PostgreSQL by a user with administrative permissions is not\n" "permitted.\n" "The server must be started under an unprivileged user ID to prevent\n" "possible system security compromises. See the documentation for\n" "more information on how to properly start the server.\n"); exit(1); } Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
2016-11-21 21:16 GMT+01:00 Tom Lane : > Alvaro Herrera writes: > > Something I just noticed is that transformTableExpr takes a TableExpr > > node and returns another TableExpr node. That's unlike what we do in > > other places, where the node returned is of a different type than the > > input node. I'm not real clear what happens if you try to re-transform > > a node that was already transformed, but it seems worth thinking about. > > We're not 100% consistent on that --- there are cases such as RowExpr > and CaseExpr where the same struct type is used for pre-parse-analysis > and post-parse-analysis nodes. I think it's okay as long as the > information content isn't markedly different, ie the transformation > just consists of transforming all the sub-nodes. > > Being able to behave sanely on a re-transformation used to be an > issue, but we no longer expect transformExpr to support that. > I was not sure in this case - using new node was more clear for me - safeguard against some uninitialized or untransformed value. There in only few bytes memory more overhead. regards Pavel > > regards, tom lane >
Re: [HACKERS] condition variables
On Tue, Nov 22, 2016 at 3:05 PM, Kyotaro HORIGUCHI wrote: > Hello, > > At Mon, 21 Nov 2016 15:57:47 -0500, Robert Haas wrote > in >> On Thu, Aug 11, 2016 at 5:47 PM, Robert Haas wrote: >> > So, in my >> > implementation, a condition variable wait loop looks like this: >> > >> > for (;;) >> > { >> > ConditionVariablePrepareToSleep(cv); >> > if (condition for which we are waiting is satisfied) >> > break; >> > ConditionVariableSleep(); >> > } >> > ConditionVariableCancelSleep(); >> >> I have what I think is a better idea. Let's get rid of >> ConditionVariablePrepareToSleep(cv) and instead tell users of this >> facility to write the loop this way: >> >> for (;;) >> { >> if (condition for which we are waiting is satisfied) >> break; >> ConditionVariableSleep(cv); >> } >> ConditionVariableCancelSleep(); > > It seems rather a common way to wait on a condition variable, in > shorter, > > | while (condition for which we are waiting is *not* satisfied) > | ConditionVariableSleep(cv); > | ConditionVariableCancelSleep(); Ok, let's show it that way. >> ConditionVariableSleep(cv) will check whether the current process is >> already on the condition variable's waitlist. If so, it will sleep; >> if not, it will add the process and return without sleeping. >> >> It may seem odd that ConditionVariableSleep(cv) doesn't necessary >> sleep, but this design has a significant advantage: we avoid >> manipulating the wait-list altogether in the case where the condition >> is already satisfied when we enter the loop. That's more like what we > > The condition check is done far faster than maintaining the > wait-list for most cases, I believe. > >> already do in lwlock.c: we try to grab the lock first; if we can't, we >> add ourselves to the wait-list and retry; if we then get the lock >> after all we have to recheck whether we can get the lock and remove >> ourselves from the wait-list if so. Of course, there is some cost: if >> we do have to wait, we'll end up checking the condition twice before >> actually going to sleep. However, it's probably smart to bet that >> actually needing to sleep is fairly infrequent, just as in lwlock.c. >> >> Thoughts? > > FWIW, I agree to the assumption. Here's a version that works that way, though it allows you to call ConditionVariablePrepareToSleep *optionally* before you enter your loop, in case you expect to have to wait and would rather avoid the extra loop. Maybe there isn't much point in exposing that though, since your condition test should be fast and waiting is the slow path, but we don't really really know what your condition test is. I thought about that because my use case (barrier.c) does in fact expect to hit the wait case more often than not. If that seems pointless then perhaps ConditionVariablePrepareToSleep should become static and implicit. This version does attempt to suppress spurious returns, a bit, using proclist_contains. No more cvSleeping. It's possible that future users will want a version with a timeout, or multiplexed with IO, in which case there would be some interesting questions about how this should interact with WaitEventSet. It also seems like someone might eventually want to handle postmaster death. Perhaps there shoul eventually be a way to tell WaitEventSet that you're waiting for a CV so these things can be multiplexed without exposing the fact that it's done internally with latches. -- Thomas Munro http://www.enterprisedb.com condition-variable-v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On Tue, Nov 22, 2016 at 5:33 AM, Robert Haas wrote: > On Sat, Nov 19, 2016 at 2:54 PM, Tom Lane wrote: > > Stephen Frost writes: > >> Let's create a new data type for this which supports old and new types, > >> add what casts make sense, and call it a day. Changing the data type > >> name out from under people is not helping anyone. > > > > +1. I do not think changing behavior for the existing type name is > > going to be a net win. If we'd been smart enough to make the type > > varlena from the get-go, maybe we could get away with it, but there > > is just way too much risk of trouble with a change in a fixed-length > > type's on-the-wire representation. > > I completely agree. OK. Agreed. Any suggestions for the name to be used for the new datatype the can work for both 48 and 64 bit MAC addresses? It is possible to represent a 48 bit MAC address as 64 bit MAC address by adding reserved bytes in the middle as follows. 01-01-01-01-01-01::macaddr => 01-01-01-FF-FE-01-01-01::newmacaddr While comparing a 48 bit MAC address with 64 bit MAC address, Ignore the two bytes if the contents in those bytes are reserved bytes. The new datatype can store directly whatever is the input is, like 48 bit or 64 bit. The same data is sent over the wire, whether the reserved bytes are present or not? Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
On 22 November 2016 at 11:35, Kyotaro HORIGUCHI wrote: > Hello, > > At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas wrote > in >> On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer wrote: >> > I'm still interested in hearing comments from experienced folks about >> > whether it's sane to do this at all, rather than have similar >> > duplicate signal handling for the walsender. >> >> Well, I mean, if it's reasonable to share code in a given situation, >> that is generally better than NOT sharing code... > > Walsender handles SIGUSR1 completely different way from normal > backends. The procsignal_sigusr1_handler is designed to work as > the peer of SendProcSignal (not ProcSendSignal:). Walsender is > just using a latch to be woken up. It has nothing to do with > SendProcSignal. Indeed, at the moment it doesn't. I'm proposing to change that, since walsenders doing logical decoding on a standby need to be notified of conflicts with recovery, many of which are the same as for normal user backends and bgworkers. The main exception is snapshot conflicts, where logical decoding walsenders don't care about conflicts with specific tables, they want to be terminated if the effective catalog_xmin on the upstream increases past their needed catalog_xmin. They don't care about non-catalog (or non-heap-catalog) tables. So I expect we'd just want to ignore PROCSIG_RECOVERY_CONFLICT_SNAPSHOT when running a walsender and handle conflict with catalog_xmin increases separately. > IMHO, I don't think walsender is allowed to just share the > backends' handler for a practical reason that pss_signalFlags can > harm. I'm not sure I see the problem. The ProcSignalReason argument to RecoveryConflictInterrupt states that: * Also, because of race conditions, it's important that all the signals be * defined so that no harm is done if a process mistakenly receives one. > If you need to expand the function of SIGUSR1 of walsender, more > thought would be needed. Yeah, I definitely don't think it's as simple as just using procsignal_sigusr1_handler as-is. I expect we'd likely create a new global IsWalSender and ignore some RecoveryConflictInterrupt cases when in a walsender, at least PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, and probably add a new case for catalog_xmin conflicts that's only acted on when IsWalSender. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Danger of automatic connection reset in psql
2016-11-22 3:46 GMT+01:00 Robert Haas : > On Mon, Nov 21, 2016 at 4:55 AM, Oleksandr Shulgin > wrote: > > On Tue, Nov 15, 2016 at 4:10 PM, Jim Nasby > wrote: > >> > >> On 11/14/16 5:41 AM, Oleksandr Shulgin wrote: > >>> > >>> Automatic connection reset is a nice feature for server development, > >>> IMO. Is it really useful for anything else is a good question. > >> > >> > >> I use it all the time for application development; my rebuild script > will > >> forcibly kick everyone out to re-create the database. I put that in > because > >> I invariably end up with a random psql sitting somewhere that I don't > want > >> to track down. > >> > >> What currently stinks though is if the connection is dead and the next > >> command I run is a \i, psql just dies instead of re-connecting. It'd be > nice > >> if before reading the script it checked connection status and attempted > a > >> reconnect. > >> > >>> At least an option to control that behavior seems like a good idea, > >>> maybe even set it to 'no reconnect' by default, so that people who > >>> really use it can make conscious choice about enabling it in their > >>> .psqlrc or elsewhere. > >> > >> > >> +1, I don't think it needs to be the default. > > > > > > So if we go in this direction, should the option be specified from > command > > line or available via psqlrc (or both?) I think both make sense. > > > > What should be the option and control variable names? Something like: > > --reconnect and RECONNECT? Should we allow reconnect in non-interactive > > mode? I have no use case for that, but it might be different for others. > > If non-interactive is not supported then it could be a simple boolean > > variable, otherwise we might want something like a tri-state: on / off / > > interactive (the last one being the default). > > I think it should just be another psql special variable, like > AUTOCOMMIT or VERBOSITY. If the user wants to set it on the command > line, they can just use -v. We don't need a separate, dedicated > option for this, I think. > In this case depends on a default. For almost all scripts the sensible value is "without reconnect". It be unfriendly to use this setting via -v variable. Regards Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: [HACKERS] WIP: multivariate statistics / proof of concept
On 11/21/2016 11:10 PM, Robert Haas wrote: [ reviving an old multivariate statistics thread ] On Thu, Nov 13, 2014 at 6:31 AM, Simon Riggs wrote: On 12 October 2014 23:00, Tomas Vondra wrote: It however seems to be working sufficiently well at this point, enough to get some useful feedback. So here we go. This looks interesting and useful. What I'd like to check before a detailed review is that this has sufficient applicability to be useful. My understanding is that Q9 and Q18 of TPC-H have poor plans as a result of multi-column stats errors. Could you look at those queries and confirm that this patch can produce better plans for them? Tomas, did you ever do any testing in this area? One of my colleagues, Rafia Sabih, recently did some testing of TPC-H queries @ 20 GB. Q18 actually doesn't complete at all right now because of an issue with the new simplehash implementation. I reported it to Andres and he tracked it down, but hasn't posted the patch yet - see http://archives.postgresql.org/message-id/20161115192802.jfbec5s6ougxw...@alap3.anarazel.de Of the remaining queries, the slowest are Q9 and Q20, and both of them have serious estimation errors. On Q9, things go wrong here: -> Merge Join (cost=5225092.04..6595105.57 rows=154 width=47) (actual time=103592.821..149335.010 rows=6503988 loops=1) Merge Cond: (partsupp.ps_partkey = lineitem.l_partkey) Join Filter: (lineitem.l_suppkey = partsupp.ps_suppkey) Rows Removed by Join Filter: 19511964 -> Index Scan using > [snip] Rows Removed by Filter: 756627 The estimate for the index scan on partsupp is essentially perfect, and the lineitem-part join is off by about 3x. However, the merge join is off by about 4000x, which is real bad. The patch only deals with statistics on base relations, no joins, at this point. It's meant to be extended in that direction, so the syntax supports it, but at this point that's all. No joins. That being said, this estimate should be improved in 9.6, when you create a foreign key between the tables. In fact, that patch was exactly about Q9. This is how the join estimate looks on scale 1 without the FK between the two tables: QUERY PLAN --- Merge Join (cost=19.19..700980.12 rows=2404 width=261) Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey)) -> Index Scan using idx_lineitem_part_supp on lineitem (cost=0.43..605856.84 rows=6001117 width=117) -> Index Scan using partsupp_pkey on partsupp (cost=0.42..61141.76 rows=80 width=144) (4 rows) and with the foreign key: QUERY PLAN --- Merge Join (cost=19.19..700980.12 rows=6001117 width=261) (actual rows=6001215 loops=1) Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey)) -> Index Scan using idx_lineitem_part_supp on lineitem (cost=0.43..605856.84 rows=6001117 width=117) (actual rows=6001215 loops=1) -> Index Scan using partsupp_pkey on partsupp (cost=0.42..61141.76 rows=80 width=144) (actual rows=6001672 loops=1) Planning time: 3.840 ms Execution time: 21987.913 ms (6 rows) On Q20, things go wrong here: > [snip] The estimate for the GroupAggregate feeding one side of the merge join is quite accurate. The estimate for the part-partsupp join on the other side is off by 8x. Then things get much worse: the estimate for the merge join is off by 400x. Well, most of the estimation error comes from the join, but sadly the aggregate makes using the foreign keys impossible - at least in the current version. I don't know if it can be improved, somehow. I'm not really sure whether the multivariate statistics stuff will fix this kind of case or not, but if it did it would be awesome. Join statistics are something I'd like to add eventually, but I don't see how it could happen in the first version. Also, the patch received no reviews this CF, and making it even larger is unlikely to make it more attractive. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
Hello, At Mon, 21 Nov 2016 21:39:07 -0500, Robert Haas wrote in > On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer wrote: > > I'm still interested in hearing comments from experienced folks about > > whether it's sane to do this at all, rather than have similar > > duplicate signal handling for the walsender. > > Well, I mean, if it's reasonable to share code in a given situation, > that is generally better than NOT sharing code... Walsender handles SIGUSR1 completely different way from normal backends. The procsignal_sigusr1_handler is designed to work as the peer of SendProcSignal (not ProcSendSignal:). Walsender is just using a latch to be woken up. It has nothing to do with SendProcSignal. IMHO, I don't think walsender is allowed to just share the backends' handler for a practical reason that pss_signalFlags can harm. If you need to expand the function of SIGUSR1 of walsender, more thought would be needed. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel bitmap heap scan
On Fri, Nov 18, 2016 at 9:59 AM, Amit Khandekar wrote: Thanks for the review.. > In pbms_is_leader() , I didn't clearly understand the significance of > the for-loop. If it is a worker, it can call > ConditionVariablePrepareToSleep() followed by > ConditionVariableSleep(). Once it comes out of > ConditionVariableSleep(), isn't it guaranteed that the leader has > finished the bitmap ? If yes, then it looks like it is not necessary > to again iterate and go back through the pbminfo->state checking. > Also, with this, variable queuedSelf also might not be needed. But I > might be missing something here. I have taken this logic from example posted on conditional variable thread[1] for (;;) { ConditionVariablePrepareToSleep(cv); if (condition for which we are waiting is satisfied) break; ConditionVariableSleep(); } ConditionVariableCancelSleep(); [1] https://www.postgresql.org/message-id/CA%2BTgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr%3DC56Xng%40mail.gmail.com So it appears to me even if we come out of ConditionVariableSleep(); we need to verify our condition and then only break. Not sure what happens if worker calls > ConditionVariable[Prepare]Sleep() but leader has already called > ConditionVariableBroadcast(). Does the for loop have something to do > with this ? But this can happen even with the current for-loop, it > seems. If leader has already called ConditionVariableBroadcast, but after ConditionVariablePrepareToSleep we will check the condition again before calling ConditionVariableSleep. And condition check is under SpinLockAcquire(&pbminfo->state_mutex); However I think there is one problem in my code (I think you might be pointing same), that after ConditionVariablePrepareToSleep, if pbminfo->state is already PBM_FINISHED, I am not resetting needWait to false, and which can lead to the problem. > > >> #3. Bitmap processing (Iterate and process the pages). >> In this phase each worker will iterate over page and chunk array and >> select heap pages one by one. If prefetch is enable then there will be >> two iterator. Since multiple worker are iterating over same page and >> chunk array we need to have a shared iterator, so we grab a spin lock >> and iterate within a lock, so that each worker get and different page >> to process. > > tbm_iterate() call under SpinLock : > For parallel tbm iteration, tbm_iterate() is called while SpinLock is > held. Generally we try to keep code inside Spinlock call limited to a > few lines, and that too without occurrence of a function call. > Although tbm_iterate() code itself looks safe under a spinlock, I was > checking if we can squeeze SpinlockAcquire() and SpinLockRelease() > closer to each other. One thought is : in tbm_iterate(), acquire the > SpinLock before the while loop that iterates over lossy chunks. Then, > if both chunk and per-page data remain, release spinlock just before > returning (the first return stmt). And then just before scanning > bitmap of an exact page, i.e. just after "if (iterator->spageptr < > tbm->npages)", save the page handle, increment iterator->spageptr, > release Spinlock, and then use the saved page handle to iterate over > the page bitmap. Main reason to keep Spin lock out of this function to avoid changes inside this function, and also this function takes local iterator as input which don't have spin lock reference to it. But that can be changed, we can pass shared iterator to it. I will think about this logic and try to update in next version. > > prefetch_pages() call under Spinlock : > Here again, prefetch_pages() is called while pbminfo->prefetch_mutex > Spinlock is held. Effectively, heavy functions like PrefetchBuffer() > would get called while under the Spinlock. These can even ereport(). > One option is to use mutex lock for this purpose. But I think that > would slow things down. Moreover, the complete set of prefetch pages > would be scanned by a single worker, and others might wait for this > one. Instead, what I am thinking is: grab the pbminfo->prefetch_mutex > Spinlock only while incrementing pbminfo->prefetch_pages. The rest > part viz : iterating over the prefetch pages, and doing the > PrefetchBuffer() need not be synchronised using this > pgbinfo->prefetch_mutex Spinlock. pbms_parallel_iterate() already has > its own iterator spinlock. Only thing is, workers may not do the > actual PrefetchBuffer() sequentially. One of them might shoot ahead > and prefetch 3-4 pages while the other is lagging with the > sequentially lesser page number; but I believe this is fine, as long > as they all prefetch all the required blocks. I agree with your point, will try to fix this as well. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila > I have tried using attached script multiple times on latest 9.2 code, but > couldn't reproduce the issue. Please find the log attached with this mail. > Apart from log file, below prints appear: > > WARNING: enabling "trust" authentication for local connections You can > change this by editing pg_hba.conf or using the option -A, or --auth-local > and --auth-host, the next time you run initdb. > 20075/20075 kB (100%), 1/1 tablespace > NOTICE: pg_stop_backup complete, all required WAL segments have been > archived > 20079/20079 kB (100%), 1/1 tablespace > > Let me know, if some parameters need to be tweaked to reproduce the issue? > > > It seems that the patch proposed is good, but it is better if somebody other > than you can reproduce the issue and verify if the patch fixes the same. > Thank you for reviewing the code and testing. Hmm, we could reproduce the problem on PostgreSQL 9.2.19. The script's stdout is attached as test.log, and the stderr is as follows: WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. 20099/20099 kB (100%), 1/1 tablespace NOTICE: pg_stop_backup complete, all required WAL segments have been archived 20103/20103 kB (100%), 1/1 tablespace The sizes pg_basebackup outputs is a bit different from yours. I don't see a reason for this. The test script explicitly specifies the database encoding and locale, so the encoding difference doesn't seem to be the cause. The target problem occurs only when a WAL record crosses a WAL segment boundary, so subtle change in WAL record volume would prevent the problem from happening. Anyway, could you retry with the attached test.sh? It just changes restore_command. If the problem occurs, the following pair of lines appear in the server log of the cascading standby. Could you check it? LOG: restored log file "00020003" from archive LOG: out-of-sequence timeline ID 1 (after 2) in log file 0, segment 3, offset 0 Regards Takayuki Tsunakawa test.sh Description: test.sh test.log Description: test.log -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
On Mon, Nov 21, 2016 at 9:30 PM, Tom Lane wrote: > Amit Kapila writes: >> On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila wrote: >>> Here instead of checking whether top_plan has initPlan, it should >>> check whether initPlan is present anywhere in plan tree. I think one >>> simple way could be to check *glob->subplans* instead of >>> top_plan->initPlan, > >> Patch based on above suggestion is attached with this mail. > > I think this is the right fix for the moment, because the proximate cause > of the crash is that ExecSerializePlan doesn't transmit any part of the > PlannedStmt.subplans list, which glob->subplans is the preimage of. > > Now, maybe I'm missing something, but it seems to me that ordinary > subplans could be transmitted to workers just fine. The problem with > transmitting initplans is that you'd lose the expectation of single > evaluation. > Yes and I think we can handle it such that master backend evaluates initplans and share the calculated value along with paramid with all the workers. Workers will, in turn, restore it in queryDesc->estate->es_param_exec_vals (or some other place where those can be directly used, I have yet to evaluate on this matter). I am working on a patch to parallelize queries containing initplans/subplans, so I will evaluate your suggestion of passing subplans (maybe non-InitPlans) in ExecSerializePlan as part of that patch. I have yet to figure out what is the best way to share hashed subplans, do we pass them as it is and let each worker evaluate and store it's own copy of hash table or shall we try to form the hash table once in master and then share the same with workers (which could be tricky) or shall we restrict such queries which contain hashed subplans based on assumption that it will be costly for each worker to form the hash table. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Contains and is contained by operators of inet datatypes
On Mon, Nov 21, 2016 at 7:57 AM, Andreas Karlsson wrote: > I like the patch because it means less operators to remember for me as a > PostgreSQL user. And at least for me inet is a rarely used type compared to > hstore, json and range types which all use @> and <@. I agree that it would be nice to make the choice of operator names more consistent. I don't know if doing so will please more or fewer people than it annoys. I do not like this bit from the original post: EH> The patch removes the recently committed SP-GiST index support for the EH> existing operator symbols to give move reason to the users to use the EH> new symbols. That seems like the rough equivalent of throwing a wrench into the datacenter's backup generator to "encourage" them to maintain two separate and independent backup generators. If we're going to add the more-standard operator names as synonyms for the existing operator names, let's do precisely that and no more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows
On Fri, Nov 18, 2016 at 3:23 PM, Peter Eisentraut wrote: > On 11/17/16 12:30 AM, Tsunakawa, Takayuki wrote: >> No, I'm not recommending a higher value, but just removing the doubtful >> sentences of 512MB upper limit. The advantage is that eliminating this >> sentence will make a chance for users to try best setting. > > I think this is a good point. The information is clearly > wrong/outdated. We have no new better information, but we should remove > misleading outdated advice and let users find new advice. Basically, > this just puts Windows on par with other platforms with regard to > shared_buffers tuning, doesn't it? > > I'm inclined to commit the original patch if there are no objections. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Danger of automatic connection reset in psql
On Mon, Nov 21, 2016 at 4:55 AM, Oleksandr Shulgin wrote: > On Tue, Nov 15, 2016 at 4:10 PM, Jim Nasby wrote: >> >> On 11/14/16 5:41 AM, Oleksandr Shulgin wrote: >>> >>> Automatic connection reset is a nice feature for server development, >>> IMO. Is it really useful for anything else is a good question. >> >> >> I use it all the time for application development; my rebuild script will >> forcibly kick everyone out to re-create the database. I put that in because >> I invariably end up with a random psql sitting somewhere that I don't want >> to track down. >> >> What currently stinks though is if the connection is dead and the next >> command I run is a \i, psql just dies instead of re-connecting. It'd be nice >> if before reading the script it checked connection status and attempted a >> reconnect. >> >>> At least an option to control that behavior seems like a good idea, >>> maybe even set it to 'no reconnect' by default, so that people who >>> really use it can make conscious choice about enabling it in their >>> .psqlrc or elsewhere. >> >> >> +1, I don't think it needs to be the default. > > > So if we go in this direction, should the option be specified from command > line or available via psqlrc (or both?) I think both make sense. > > What should be the option and control variable names? Something like: > --reconnect and RECONNECT? Should we allow reconnect in non-interactive > mode? I have no use case for that, but it might be different for others. > If non-interactive is not supported then it could be a simple boolean > variable, otherwise we might want something like a tri-state: on / off / > interactive (the last one being the default). I think it should just be another psql special variable, like AUTOCOMMIT or VERBOSITY. If the user wants to set it on the command line, they can just use -v. We don't need a separate, dedicated option for this, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
On Mon, Nov 21, 2016 at 3:21 AM, Craig Ringer wrote: > I'm still interested in hearing comments from experienced folks about > whether it's sane to do this at all, rather than have similar > duplicate signal handling for the walsender. Well, I mean, if it's reasonable to share code in a given situation, that is generally better than NOT sharing code... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] condition variables
Hello, At Mon, 21 Nov 2016 15:57:47 -0500, Robert Haas wrote in > On Thu, Aug 11, 2016 at 5:47 PM, Robert Haas wrote: > > So, in my > > implementation, a condition variable wait loop looks like this: > > > > for (;;) > > { > > ConditionVariablePrepareToSleep(cv); > > if (condition for which we are waiting is satisfied) > > break; > > ConditionVariableSleep(); > > } > > ConditionVariableCancelSleep(); > > I have what I think is a better idea. Let's get rid of > ConditionVariablePrepareToSleep(cv) and instead tell users of this > facility to write the loop this way: > > for (;;) > { > if (condition for which we are waiting is satisfied) > break; > ConditionVariableSleep(cv); > } > ConditionVariableCancelSleep(); It seems rather a common way to wait on a condition variable, in shorter, | while (condition for which we are waiting is *not* satisfied) | ConditionVariableSleep(cv); | ConditionVariableCancelSleep(); > ConditionVariableSleep(cv) will check whether the current process is > already on the condition variable's waitlist. If so, it will sleep; > if not, it will add the process and return without sleeping. > > It may seem odd that ConditionVariableSleep(cv) doesn't necessary > sleep, but this design has a significant advantage: we avoid > manipulating the wait-list altogether in the case where the condition > is already satisfied when we enter the loop. That's more like what we The condition check is done far faster than maintaining the wait-list for most cases, I believe. > already do in lwlock.c: we try to grab the lock first; if we can't, we > add ourselves to the wait-list and retry; if we then get the lock > after all we have to recheck whether we can get the lock and remove > ourselves from the wait-list if so. Of course, there is some cost: if > we do have to wait, we'll end up checking the condition twice before > actually going to sleep. However, it's probably smart to bet that > actually needing to sleep is fairly infrequent, just as in lwlock.c. > > Thoughts? FWIW, I agree to the assumption. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regression tests fails
Pavel Stehule writes: >> 2016-11-21 8:09 GMT+01:00 Craig Ringer : >>> Simple fix here is to append COLLATE "C" after the ORDER BY. > here is a patch Pushed, thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: autovacuum: Drop orphan temp tables more quickly but with more c
On Mon, Nov 21, 2016 at 4:57 PM, Tom Lane wrote: > Michael Paquier writes: >> On Mon, Nov 21, 2016 at 10:05 AM, Robert Haas wrote: >>> autovacuum: Drop orphan temp tables more quickly but with more caution. > >> I have found an obvious bug when reading the code this morning: >> orphan_failures is not initialized: > > My compiler noticed that, too. Will push. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location
On Mon, Nov 21, 2016 at 5:42 PM, Peter Geoghegan wrote: >>> Or, you might want to make >>> sure that B-Tree tuple insertions still find that "first page" to >>> check, despite still generally treating heap item pointer as part of >>> the key proper (in unique indexes, it can still behave like NULL, and >>> not participate in uniqueness checking, while still essentially being >>> part of the key/sort order). >> >> It behaves like that (even though it's not coded like that) > > Why not? What do you mean? > > What I'm interested in evaluating is an implementation where an index > on (foo) has a sort order/key space that is precisely the same as an > index on (foo, ctid), with zero exceptions. Well, the patch does the same sorting, but without explicitly adding the ctid to the scankey. When inserting into a unique index, the lookup for the insertion point proceeds as it would if the key was (foo, ctid), finding a page in the middle of the range that contain item pointers for foo. When that happens on a regular index, the insertion is done at that point and nothing else needs to be done. But when it happens on a unique index, the code first checks to see if the first item pointer for foo is on the same page (allegedly a frequent case). If so, the uniqueness check is done in a very straightforward and efficient manner. If not, however, the code retraces its steps up the tree to the first time it followed a key other than foo (that's the only way it could land at a middle page), and then follows the downlinks looking at a truncated key (just foo, ignoring ctid). Kind of what you say, but without the explicit null. > >>> It also occurs to me that our performance for updates in the event of >>> many NULL values in indexes is probably really bad, and could be >>> helped by this. You'll want to test all of this with a workload that >>> isn't very sympathetic to HOT, of course -- most of these benefits are >>> seen when HOT doesn't help. >> >> I haven't really tested with an overabundance of NULLs, I'll add that >> to the tests. I have tested low-cardinality indexes though, but only >> for correctness. > > I think we'll have to model data distribution to evaluate the idea > well. After all, if there is a uniform distribution of values anyway, > you're going to end up with many dirty leaf pages. Whereas, in the > more realistic case where updates are localized, we can hope to better > amortize the cost of inserting index tuples, and recycling index > tuples. Quite possibly >> What do you mean with "first class part"? >> >> It's not included in the scankey for a variety of reasons, not the >> least of which is not breaking the interface for current uses, and >> because the tid type is quite limited in its capabilities ATM. Also, >> the heap TID is usually there on most AM calls that care about it (ie: >> insertions have it, of course), so adding it to the scankey also >> seemed redundant. > > I mean that insertions into indexes that are low cardinality should be > largely guided by TID comparisons. ... >> If not adding to the scankey, what do you mean by "first class" then? > > Just what I said about the sort order of an index on "(foo)" now > precisely matching what we'd get for "(foo, ctid)". It is the case in both versions of the patch > There are a couple > of tricky issues with that that you'd have to look out for, like > making sure that the high key continues to hold a real TID, which at a > glance looks like something that just happens already. I'm not sure > that we preserve the item pointer TID in all cases, since the current > assumption is that a high key's TID is just filler -- maybe we can > lose that at some point. I thought long about that, and inner pages don't need a real TID in their keys, as those keys only specify key space cutoff points and not real tids, and high tids in leaf pages are always real. I haven't had any issue with that, aside from the headaches resulting from thinking about that for protracted periods of time. > You should use amcheck to specifically verify that that happens > reliably in all cases. Presumably, its use of an insertion scankey > would automatically see the use of TID as a tie-breaker with patched > Postgres amcheck verification, and so amcheck will work for this > purpose unmodified. It's totally on my roadmap. I'll have to adapt it for the new index format though, it doesn't work without modification. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: autovacuum: Drop orphan temp tables more quickly but with more c
Michael Paquier writes: > On Mon, Nov 21, 2016 at 10:05 AM, Robert Haas wrote: >> autovacuum: Drop orphan temp tables more quickly but with more caution. > I have found an obvious bug when reading the code this morning: > orphan_failures is not initialized: My compiler noticed that, too. Will push. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: autovacuum: Drop orphan temp tables more quickly but with more c
On Mon, Nov 21, 2016 at 10:05 AM, Robert Haas wrote: > autovacuum: Drop orphan temp tables more quickly but with more caution. > > Previously, we only dropped an orphan temp table when it became old > enough to threaten wraparound; instead, doing it immediately. The > only value of waiting is that someone might be able to examine the > contents of the orphan temp table for forensic purposes, but it's > pretty difficult to actually do that and few users will wish to do so. > On the flip side, not performing the drop immediately generates log > spam and bloats pg_class. I have found an obvious bug when reading the code this morning: orphan_failures is not initialized: diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 954c1a1..be357e7 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -1908,7 +1908,7 @@ do_autovacuum(void) BufferAccessStrategy bstrategy; ScanKeyData key; TupleDesc pg_class_desc; - int orphan_failures; + int orphan_failures = 0; int effective_multixact_freeze_max_age; /* Attached is a patch. -- Michael diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 954c1a1..be357e7 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -1908,7 +1908,7 @@ do_autovacuum(void) BufferAccessStrategy bstrategy; ScanKeyData key; TupleDesc pg_class_desc; - int orphan_failures; + int orphan_failures = 0; int effective_multixact_freeze_max_age; /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve OOM handling in pg_locale.c
Mithun Cy writes: > On Thu, Oct 13, 2016 at 1:40 PM, Michael Paquier > wrote: >> I am attaching that to the next CF. > One thing which you might need to reconsider is removal of memory leak > comments. There is still a leak if there is an error while encoding in > db_encoding_strdup. > Unless you want to catch those error with an TRY();CATCH(); and then > free the mem. I could have lived with leaving the leak there, but really this wasn't fixing the worst problem with the code: if it did throw an error out of the middle of that sequence, it would leave the process setlocale'd to some other locale than we want. That could lead to unwanted behavior in printf and other functions. And this isn't all that unlikely: an encoding conversion failure is definitely possible if you have a locale selected that's not compatible with the database encoding. I whacked the patch around enough so that we didn't do anything except libc calls between setting and restoring the locale. At that point it was just a matter of adding a TRY block to be able to say that we didn't leak any strdup'd strings, so I figured "might as well". Pushed with those changes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve OOM handling in pg_locale.c
On Tue, Nov 22, 2016 at 8:28 AM, Tom Lane wrote: > I could have lived with leaving the leak there, but really this wasn't > fixing the worst problem with the code: if it did throw an error out of > the middle of that sequence, it would leave the process setlocale'd to > some other locale than we want. That could lead to unwanted behavior > in printf and other functions. And this isn't all that unlikely: an > encoding conversion failure is definitely possible if you have a locale > selected that's not compatible with the database encoding. > > I whacked the patch around enough so that we didn't do anything except > libc calls between setting and restoring the locale. At that point it > was just a matter of adding a TRY block to be able to say that we > didn't leak any strdup'd strings, so I figured "might as well". > > Pushed with those changes. Thanks. The changes you have done look good to me at short sight. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
Robert Haas writes: > Don't sweat it! Your sqlsmith fuzz testing has been extremely valuable. That might be the understatement of the year. I don't know how long it would have taken us to find these things in the field. Thanks! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
On Mon, Nov 21, 2016 at 5:14 PM, Andreas Seltenreich wrote: > Ashutosh Sharma writes: >>> the following query appears to reliably crash parallel workers on master >>> as of 0832f2d. > >> As suggested, I have tried to reproduce this issue on *0832f2d* commit but >> could not reproduce it. > > as Tom pointed out earlier, I had secretly set parallel_setup_cost and > parallel_tuple_cost to 0. I assumed these were irrelevant when > force_parallel_mode is on. I'll do less assuming and more testing on a > vanilla install on future reports. > > Sorry for the inconvenience, Don't sweat it! Your sqlsmith fuzz testing has been extremely valuable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
Hi, Ashutosh Sharma writes: >> the following query appears to reliably crash parallel workers on master >> as of 0832f2d. > As suggested, I have tried to reproduce this issue on *0832f2d* commit but > could not reproduce it. as Tom pointed out earlier, I had secretly set parallel_setup_cost and parallel_tuple_cost to 0. I assumed these were irrelevant when force_parallel_mode is on. I'll do less assuming and more testing on a vanilla install on future reports. Sorry for the inconvenience, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: multivariate statistics / proof of concept
[ reviving an old multivariate statistics thread ] On Thu, Nov 13, 2014 at 6:31 AM, Simon Riggs wrote: > On 12 October 2014 23:00, Tomas Vondra wrote: > >> It however seems to be working sufficiently well at this point, enough >> to get some useful feedback. So here we go. > > This looks interesting and useful. > > What I'd like to check before a detailed review is that this has > sufficient applicability to be useful. > > My understanding is that Q9 and Q18 of TPC-H have poor plans as a > result of multi-column stats errors. > > Could you look at those queries and confirm that this patch can > produce better plans for them? Tomas, did you ever do any testing in this area? One of my colleagues, Rafia Sabih, recently did some testing of TPC-H queries @ 20 GB. Q18 actually doesn't complete at all right now because of an issue with the new simplehash implementation. I reported it to Andres and he tracked it down, but hasn't posted the patch yet - see http://archives.postgresql.org/message-id/20161115192802.jfbec5s6ougxw...@alap3.anarazel.de Of the remaining queries, the slowest are Q9 and Q20, and both of them have serious estimation errors. On Q9, things go wrong here: -> Merge Join (cost=5225092.04..6595105.57 rows=154 width=47) (actual time=103592.821..149335.010 rows=6503988 loops=1) Merge Cond: (partsupp.ps_partkey = lineitem.l_partkey) Join Filter: (lineitem.l_suppkey = partsupp.ps_suppkey) Rows Removed by Join Filter: 19511964 -> Index Scan using idx_partsupp_partkey on partsupp (cost=0.43..781956.32 rows=15999792 width=22) (actual time=0.044..11825.481 rows=15999881 loops=1) -> Sort (cost=5224967.03..5245348.02 rows=8152396 width=45) (actual time=103592.505..112205.444 rows=26015949 loops=1) Sort Key: part.p_partkey Sort Method: quicksort Memory: 704733kB -> Hash Join (cost=127278.36..4289121.18 rows=8152396 width=45) (actual time=1084.370..94732.951 rows=6503988 loops=1) Hash Cond: (lineitem.l_partkey = part.p_partkey) -> Seq Scan on lineitem (cost=0.00..3630339.08 rows=119994608 width=41) (actual time=0.015..33355.637 rows=119994608 loops=1) -> Hash (cost=123743.07..123743.07 rows=282823 width=4) (actual time=1083.686..1083.686 rows=216867 loops=1) Buckets: 524288 Batches: 1 Memory Usage: 11721kB -> Gather (cost=1000.00..123743.07 rows=282823 width=4) (actual time=0.418..926.283 rows=216867 loops=1) Workers Planned: 4 Workers Launched: 4 -> Parallel Seq Scan on part (cost=0.00..94460.77 rows=70706 width=4) (actual time=0.063..962.909 rows=43373 loops=5) Filter: ((p_name)::text ~~ '%grey%'::text) Rows Removed by Filter: 756627 The estimate for the index scan on partsupp is essentially perfect, and the lineitem-part join is off by about 3x. However, the merge join is off by about 4000x, which is real bad. On Q20, things go wrong here: -> Merge Join (cost=5928271.92..6411281.44 rows=278 width=16) (actual time=77887.963..136614.284 rows=118124 loops=1) Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey)) Join Filter: ((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity Rows Removed by Join Filter: 242 -> GroupAggregate (cost=5363980.40..5691151.45 rows=9681876 width=48) (actual time=76672.726..131482.677 rows=10890067 loops=1) Group Key: lineitem.l_partkey, lineitem.l_suppkey -> Sort (cost=5363980.40..5409466.13 rows=18194291 width=21) (actual time=76672.661..86405.882 rows=18194084 loops=1) Sort Key: lineitem.l_partkey, lineitem.l_suppkey Sort Method: external merge Disk: 551376kB -> Bitmap Heap Scan on lineitem (cost=466716.05..3170023.42 rows=18194291 width=21) (actual time=13735.552..39289.995 rows=18195269 loops=1) Recheck Cond: ((l_shipdate >= '1994-01-01'::date) AND (l_shipdate < '1995-01-01 00:00:00'::timestamp without time zone))
Re: [HACKERS] postgres 9.3 postgres_fdw ::LOG: could not receive data from client: Connection reset by peer
On Mon, Nov 21, 2016 at 8:32 AM, Vladimir Svedov wrote: > I have this question. Looked for a help on http://dba.stackexchange.com/ > No success. A link to the actual question would be appreciated. > "FOREIGN_TABLE" created with postgres_fdw. LOCAL_TABLE is just a local > table... > > Symptoms: > > I run in psql query SELECT * from FOREIGN_TABLE. No log generated > I run in bash psql -c "SELECT * from LOCAL_TABLE". No log generated > I run in bash psql -c "SELECT * from FOREIGN_TABLE". ::LOG: could not receive > data from client: Connection reset by peer generated in postgres log Please provide more information, and preferably a self-contained test case (one that anyone can run on an empty database to see the problem). https://wiki.postgresql.org/wiki/Guide_to_reporting_problems -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fun fact about autovacuum and orphan temp tables
On Tue, Nov 22, 2016 at 3:06 AM, Robert Haas wrote: > I don't think that you should skip it in the wraparound case, because > it might be one of the temp tables that is threatening wraparound. > > I've committed the patch after doing some work on the comments. OK, thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink get_connect_string() passes FDW option "updatable" to the connect string, connection fails.
> > The dblink docs recommend using dblink_fdw as the FDW for this purpose, > which would only accept legal connstr options. However, I can see the > point of using a postgres_fdw server instead, and considering that > dblink isn't actually enforcing use of any particular FDW type, it seems > like the onus should be on it to be more wary of what the options are. > I have to admit, this was the first I'd heard of dblink_fdw. I'm glad it exists, though. And yes, I'd like to be able to use postgres_fdw entries because there's value in knowing that the connection for your ad-hoc SQL exactly matches the connection used for other FDW tables. > It looks like this might be fairly easy to fix by having > get_connect_string() use is_valid_dblink_option() to check each > option name, and silently ignore options that are inappropriate. > >From what I can tell, it is very straightforward, the context oids are set up just a few lines above where the new is_valid_dblink_option() calls would be. I'm happy to write the patch, for both v10 and any back-patches we feel are necessary. However, I suspect with a patch this trivial that reviewing another person's patch might be more work for a committer than just doing it themselves. If that's not the case, let me know and I'll get started.
Re: [HACKERS] condition variables
On Thu, Aug 11, 2016 at 5:47 PM, Robert Haas wrote: > So, in my > implementation, a condition variable wait loop looks like this: > > for (;;) > { > ConditionVariablePrepareToSleep(cv); > if (condition for which we are waiting is satisfied) > break; > ConditionVariableSleep(); > } > ConditionVariableCancelSleep(); I have what I think is a better idea. Let's get rid of ConditionVariablePrepareToSleep(cv) and instead tell users of this facility to write the loop this way: for (;;) { if (condition for which we are waiting is satisfied) break; ConditionVariableSleep(cv); } ConditionVariableCancelSleep(); ConditionVariableSleep(cv) will check whether the current process is already on the condition variable's waitlist. If so, it will sleep; if not, it will add the process and return without sleeping. It may seem odd that ConditionVariableSleep(cv) doesn't necessary sleep, but this design has a significant advantage: we avoid manipulating the wait-list altogether in the case where the condition is already satisfied when we enter the loop. That's more like what we already do in lwlock.c: we try to grab the lock first; if we can't, we add ourselves to the wait-list and retry; if we then get the lock after all we have to recheck whether we can get the lock and remove ourselves from the wait-list if so. Of course, there is some cost: if we do have to wait, we'll end up checking the condition twice before actually going to sleep. However, it's probably smart to bet that actually needing to sleep is fairly infrequent, just as in lwlock.c. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location
On Mon, Nov 21, 2016 at 9:42 AM, Claudio Freire wrote: >> I realized today, quite suddenly, why I disliked your original patch: >> it didn't go far enough with embracing the idea of >> heap-item-pointer-as-key. In other words, I didn't like that you >> didn't change anything in the internal pages. > > But it did. In fact it only changed internal pages. Oh, okay. >> Maybe you should put >> heap TIDs someplace in new internal pages, so that even there we treat >> them as part of the key. > > That is indeed what's happening (both in the old and new version). Good. > The new version also opens up to the possibility of omitting the heap > TID in inner pages, if they're redundant (ie: if two consecutive keys > are different already, the heap TID part of the key is redundant, > since it's not necessary to make tree descent decisions). Makes sense; this is just another aspect of suffix truncation. >> This may significantly benefit UPDATE-heavy >> workloads where HOT doesn't work out so well. Particularly for >> non-unique indexes, we currently have to wade through a lot of bloat >> from the leaf level, rather than jumping straight to the correct leaf >> page when updating or inserting. > > That is improved in the patch as well. The lookup for an insertion > point for a heavily bloated (unique or not) index is done efficiently, > instead of the O(N^2) way it used before. Cool. > It's done even for unique indexes. Locking in that case becomes > complex, true, but I believe it's not an insurmountable problem. I actually don't think that that would be all that complicated. It's just one case where you have to mostly match the existing behavior. >> Or, you might want to make >> sure that B-Tree tuple insertions still find that "first page" to >> check, despite still generally treating heap item pointer as part of >> the key proper (in unique indexes, it can still behave like NULL, and >> not participate in uniqueness checking, while still essentially being >> part of the key/sort order). > > It behaves like that (even though it's not coded like that) Why not? What do you mean? What I'm interested in evaluating is an implementation where an index on (foo) has a sort order/key space that is precisely the same as an index on (foo, ctid), with zero exceptions. >> It also occurs to me that our performance for updates in the event of >> many NULL values in indexes is probably really bad, and could be >> helped by this. You'll want to test all of this with a workload that >> isn't very sympathetic to HOT, of course -- most of these benefits are >> seen when HOT doesn't help. > > I haven't really tested with an overabundance of NULLs, I'll add that > to the tests. I have tested low-cardinality indexes though, but only > for correctness. I think we'll have to model data distribution to evaluate the idea well. After all, if there is a uniform distribution of values anyway, you're going to end up with many dirty leaf pages. Whereas, in the more realistic case where updates are localized, we can hope to better amortize the cost of inserting index tuples, and recycling index tuples. > What do you mean with "first class part"? > > It's not included in the scankey for a variety of reasons, not the > least of which is not breaking the interface for current uses, and > because the tid type is quite limited in its capabilities ATM. Also, > the heap TID is usually there on most AM calls that care about it (ie: > insertions have it, of course), so adding it to the scankey also > seemed redundant. I mean that insertions into indexes that are low cardinality should be largely guided by TID comparisons. > If not adding to the scankey, what do you mean by "first class" then? Just what I said about the sort order of an index on "(foo)" now precisely matching what we'd get for "(foo, ctid)". There are a couple of tricky issues with that that you'd have to look out for, like making sure that the high key continues to hold a real TID, which at a glance looks like something that just happens already. I'm not sure that we preserve the item pointer TID in all cases, since the current assumption is that a high key's TID is just filler -- maybe we can lose that at some point. You should use amcheck to specifically verify that that happens reliably in all cases. Presumably, its use of an insertion scankey would automatically see the use of TID as a tie-breaker with patched Postgres amcheck verification, and so amcheck will work for this purpose unmodified. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink get_connect_string() passes FDW option "updatable" to the connect string, connection fails.
On 11/21/2016 02:16 PM, Tom Lane wrote: > Corey Huinker writes: >> I ran into this today: >> CREATE SERVER bob_srv FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host >> 'localhost', dbname :'current_db' ); >> ... >> ALTER SERVER bob_srv OPTIONS (updatable 'true'); >> SELECT * >> FROM dblink('bob_srv','SELECT 1') as t(x integer); >> psql:bug_example.sql:18: ERROR: could not establish connection >> DETAIL: invalid connection option "updatable" > >> Is this something we want to fix? > > The dblink docs recommend using dblink_fdw as the FDW for this purpose, > which would only accept legal connstr options. However, I can see the > point of using a postgres_fdw server instead, and considering that > dblink isn't actually enforcing use of any particular FDW type, it seems > like the onus should be on it to be more wary of what the options are. > > It looks like this might be fairly easy to fix by having > get_connect_string() use is_valid_dblink_option() to check each > option name, and silently ignore options that are inappropriate. Thanks for the report and analysis. I'll take a look at creating a patch this week. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?
Robert Haas writes: > 1. Should we try to avoid having the stats collector write a stats > file during an immediate shutdown? The file will be removed anyway > during crash recovery, so writing it is pointless. The point I was trying to make is that I think the forced-removal behavior is not desirable, and therefore committing a patch that makes it be graven in stone is not desirable either. The larger picture here is that Takayuki-san wants us to commit a patch based on a customer's objection to 9.2's behavior, without any real evidence that the 9.4 change isn't a sufficient solution. I've got absolutely zero sympathy for that "the stats collector might be stuck in an unkillable state" argument --- where's the evidence that the stats collector is any more prone to that than any other postmaster child? And for that matter, if we are stuck because of a nonresponding NFS server, how is a quicker postmaster exit going to help anything? You're not going to be able to start a new postmaster if the data directory is on a nonresponsive server. I'd be willing to entertain a proposal to make the 5-second limit adjustable, but I don't think we need entirely new behavior here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres abort found in 9.3.11
"K S, Sandhya (Nokia - IN/Bangalore)" writes: > As suggested by you, we upgraded the postgres to version 9.3.14. Also we > removed all the patches we had applied before. But the issue is still > observed in the latest version as well. Can you make a test case for other people to try? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Should we fix postmaster to avoid slow shutdown?
On Sun, Nov 20, 2016 at 10:20 PM, Tsunakawa, Takayuki wrote: > The reasons why I proposed this patch are: > > * It happened in a highly mission-critical production system of a customer > who uses 9.2. > > * 9.4's solution is not perfect, because it wastes 5 seconds anyway, which is > unexpected for users. The customer's requirement includes failover within 30 > seconds, so 5 seconds can be seen as a risk. > Plus, I'm worried about the possibility that the SIGKILLed process wouldn't > disappear if it's writing to a network storage like NFS. > > * And first of all, the immediate shutdown should shut the server down > immediately without doing anything heavy, as the name means. So there are two questions here: 1. Should we try to avoid having the stats collector write a stats file during an immediate shutdown? The file will be removed anyway during crash recovery, so writing it is pointless. I think you are right that 9.4's solution here is not perfect, because of the 5 second delay, and also because if the stats collector is stuck inside the kernel trying to write to the OS, it may be in a non-interruptible wait state where even SIGKILL has no immediate effect. Anyway, it's stupid even from a performance point of view to waste time writing a file that we're just going to nuke. 2. Should we close listen sockets sooner during an immediate shutdown? I agree with Tom and Peter that this isn't a good idea. People expect the sockets not to go away until the end - e.g. they use PQping() to test the server status, or they connect just to see what error they get - and the fact that a client application could hypothetically generate such a relentless stream of connection attempts that the dead-end backends thereby created slow down shutdown is not in my mind a sufficient reason to change the behavior. So I think 001 should proceed and 002 should be rejected. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
Alvaro Herrera writes: > Something I just noticed is that transformTableExpr takes a TableExpr > node and returns another TableExpr node. That's unlike what we do in > other places, where the node returned is of a different type than the > input node. I'm not real clear what happens if you try to re-transform > a node that was already transformed, but it seems worth thinking about. We're not 100% consistent on that --- there are cases such as RowExpr and CaseExpr where the same struct type is used for pre-parse-analysis and post-parse-analysis nodes. I think it's okay as long as the information content isn't markedly different, ie the transformation just consists of transforming all the sub-nodes. Being able to behave sanely on a re-transformation used to be an issue, but we no longer expect transformExpr to support that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres abort found in 9.3.11
Hello, As suggested by you, we upgraded the postgres to version 9.3.14. Also we removed all the patches we had applied before. But the issue is still observed in the latest version as well. The issue is seen during normal run and only observed in the standby node. This time as well, the same error log is observed. node-1 postgres[8743]: [18-1] PANIC: btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent data Can you please share your inputs which would help us proceed further? Regards, Sandhya -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Friday, September 16, 2016 1:29 AM To: K S, Sandhya (Nokia - IN/Bangalore) Cc: pgsql-hackers@postgresql.org; Itnal, Prakash (Nokia - IN/Bangalore) Subject: Re: [HACKERS] Postgres abort found in 9.3.11 "K S, Sandhya (Nokia - IN/Bangalore)" writes: > We tried to replicate the scenario without our patch(exiting postmaster) and > still we were able to see the issue. > Same error was seen this time as well. > node-0 postgres[8243]: [1-2] HINT: Is another postmaster already running on > port 5433? If not, wait a few seconds and retry. > node-1 postgres[8650]: [18-1] PANIC: btree_xlog_delete_get_latestRemovedXid: > cannot operate with inconsistent data > Crash was not seen in 9.3.9 without the patch but it was reproduced in 9.3.11. > So something specifically changed between 9.3.9 and 9.3.11 is causing the > issue. Well, I looked through the git history from 9.3.9 to 9.3.11 and I don't see anything that seems likely to explain a problem here. If you can reproduce this, which it sounds like you can, maybe you could create a self-contained test case for other people to try? Also worth noting is that the current 9.3.x release is 9.3.14. You might save yourself some time by updating and seeing if it still reproduces in 9.3.14. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
Something I just noticed is that transformTableExpr takes a TableExpr node and returns another TableExpr node. That's unlike what we do in other places, where the node returned is of a different type than the input node. I'm not real clear what happens if you try to re-transform a node that was already transformed, but it seems worth thinking about. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
[ apologies for not responding sooner ] Etsuro Fujita writes: > On 2016/11/10 5:17, Tom Lane wrote: >> I think there's a very good argument that we should just treat any inval >> in pg_foreign_data_wrapper as a reason to blow away the whole plan cache. >> People aren't going to alter FDW-level options often enough to make it >> worth tracking things more finely. Personally I'd put pg_foreign_server >> changes in the same boat, though maybe those are changed slightly more >> often. If it's worth doing anything more than that, it would be to note >> which plans mention foreign tables at all, and not invalidate those that >> don't. We could do that with, say, a hasForeignTables bool in >> PlannedStmt. > I agree on that point. OK, please update the patch to handle those catalogs that way. >> That leaves the question of whether it's worth detecting table-level >> option changes this way, or whether we should just handle those by forcing >> a relcache inval in ATExecGenericOptions, as in Amit's original patch in >> <5702298d.4090...@lab.ntt.co.jp>. I kind of like that approach; that >> patch was short and sweet, and it put the cost on the unusual path (ALTER >> TABLE) not the common path (every time we create a query plan). > I think it'd be better to detect table-level option changes because that > could allow us to do more; we could avoid invalidating cached plans that > need not to be invalidated, by tracking the plan dependencies more > exactly, ie, avoid collecting the dependencies of foreign tables in a > plan that are in dead subqueries or excluded by constraint exclusion. > The proposed patch doesn't do that, though. I think updates on > pg_foreign_table would be more frequent, so it might be worth > complicating the code. But the relcache-inval approach couldn't do > that, IIUC. Why not? But the bigger picture here is that relcache inval is the established practice for forcing replanning due to table-level changes, and I have very little sympathy for inventing a new mechanism for that just for foreign tables. If it were worth bypassing replanning for changes in tables in dead subqueries, for instance, it would surely be worth making that happen for all table types not just foreign tables. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding on standby
Hi, On 2016-11-21 16:17:58 +0800, Craig Ringer wrote: > I've prepared a working initial, somewhat raw implementation for > logical decoding on physical standbys. Please attach. Otherwise in a year or two it'll be impossible to look this up. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink get_connect_string() passes FDW option "updatable" to the connect string, connection fails.
Corey Huinker writes: > I ran into this today: > CREATE SERVER bob_srv FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host > 'localhost', dbname :'current_db' ); > ... > ALTER SERVER bob_srv OPTIONS (updatable 'true'); > SELECT * > FROM dblink('bob_srv','SELECT 1') as t(x integer); > psql:bug_example.sql:18: ERROR: could not establish connection > DETAIL: invalid connection option "updatable" > Is this something we want to fix? The dblink docs recommend using dblink_fdw as the FDW for this purpose, which would only accept legal connstr options. However, I can see the point of using a postgres_fdw server instead, and considering that dblink isn't actually enforcing use of any particular FDW type, it seems like the onus should be on it to be more wary of what the options are. It looks like this might be fairly easy to fix by having get_connect_string() use is_valid_dblink_option() to check each option name, and silently ignore options that are inappropriate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink get_connect_string() passes FDW option "updatable" to the connect string, connection fails.
On Sun, Nov 20, 2016 at 7:37 PM, Corey Huinker wrote: > I ran into this today: > > select current_database() as current_db \gset > CREATE EXTENSION postgres_fdw; > CREATE EXTENSION > CREATE EXTENSION dblink; > CREATE EXTENSION > CREATE ROLE bob LOGIN PASSWORD 'bob'; > CREATE ROLE > CREATE SERVER bob_srv FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host > 'localhost', dbname :'current_db' ); > CREATE SERVER > CREATE USER MAPPING FOR public SERVER bob_srv OPTIONS ( user 'bob', password > 'bob' ); > CREATE USER MAPPING > SELECT * > FROM dblink('bob_srv','SELECT 1') as t(x integer); > x > --- > 1 > (1 row) > > ALTER SERVER bob_srv OPTIONS (updatable 'true'); > ALTER SERVER > SELECT * > FROM dblink('bob_srv','SELECT 1') as t(x integer); > psql:bug_example.sql:18: ERROR: could not establish connection > DETAIL: invalid connection option "updatable" > > > Is this something we want to fix? Looks like a bug to me. > If so, are there any other fdw/server/user-mapping options that we don't > want to pass along to the connect string? InitPgFdwOptions has an array labeled as /* non-libpq FDW-specific FDW options */. Presumably all of those should be handled in a common way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Tue, Nov 22, 2016 at 7:29 AM, Kevin Grittner wrote: > On Mon, Nov 21, 2016 at 11:29 AM, Alvaro Herrera > wrote: > >>> When I used the word "cache" here, I was thinking more of this >>> English language definition: >>> >>> a : a hiding place especially for concealing and preserving >>>provisions or implements >>> b : a secure place of storage >>> >>> The intent being to emphasize that there is not one public >>> "registry" of such objects, but context-specific collections where >>> references are tucked away when they become available for later use >>> in the only the appropriate context. > >> How about "stash"? According to my reading of Merriam-Webster's >> definition, "stash" mostly appears to be the thing that is stored >> (hidden), rather than the place it's stored in, but one of the >> definitions is "hiding place", and "cache" is listed as a synonym. > > "Stash" seems better that "cache" or "registry", especially since > many programmers these days seem to associate "cache" with > pass-through proxy techniques. I first became familiar with the > term "cache" while reading Jack London, and tend to retain some > association with the more general definition. Clearly I am in the > minority on that here. I was suggesting something like QueryEnvironment because it focuses on its role, not that fact that there are things stored in it. It's conceptually like the environment in an interpreter, which is some kind of namespace into which objects are bound by name. My suggestion "SpiRelation" now seems a bit short sighted in light of your comments, so I retract that bit. How about a QueryEnvironment (as shown in the patch I posted) that contains a list of NamedTuplestore pointers (the SpiRelation struct in the patch I posted, renamed) and in future perhaps lists of other ephemeral/transient objects that we want to expose to SQL? We would possibly have more than one list because SQL is not "Lisp-1" in nature: relations and functions and other kinds of object exist in different namespaces, though there may need to be polymorphism among kinds of named relations in the same list, so perhaps NamedTuplestore should be a node with a tag. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Implement failover on libpq connect level.
On Sun, Nov 20, 2016 at 8:48 PM, Tsunakawa, Takayuki wrote: >> True, but raising the bar for this feature so that it doesn't get done is >> also bad. It can be improved in a later patch. > > I thought you are very strict about performance, so I hesitate to believe you > forgive the additional round trip. libpq failover is a new feature in v10, > so I think it should provide the best user experience for v10 client+server > users from the start. If the concern is the time for development, then > support for older releases can be added in a later patch. > > There are still several months left for v10. Why don't we try the best? > Could you share the difficulty? I am very strict about regressing the performance of things that we already have, but I try not to make a policy that a new feature must be as fast as it could ever be. That could result in us having very few new features. Of course, it also matters how frequently the overhead is likely to be incurred. A little overhead on each tuple visibility check is a much bigger problem than a little overhead on each CREATE TABLE statement, which in turn is a much bigger problem than a little overhead on each connection attempt. For good performance, connections must last a while, so a little extra time setting one up doesn't seem like a showstopper to me. Anyway, anybody who finds this mechanism too expensive doesn't have to use it; they can implement something else instead. Also, I am not saying that we should not change this in time for v10. I'm saying that I don't think it should be a requirement for the next patch to be committed in this area to introduce a whole new mechanism for determining whether something is a master or a standby. Love it or hate it, pgsql-jdbc has already implemented something in this area and it does something useful -- without requiring a wire protocol change. Now you and Kevin are trying to say that what they did is all wrong, but I don't agree. There are very many users for whom the pgsql-jdbc approach will do exactly what they need, and no doubt some for whom it won't. Getting a patch that mimics that approach committed is *progress*. Improving it afterwards, whether for v10 or some later release, is also good. There is a saying that one should not let the perfect be the enemy of the good. I believe that saying applies here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Mon, 21 Nov 2016 13:17:17 -0500 Robert Haas wrote: > > I've a couple of other patches that do > > a little cleanup on master that I'd also like to submit > > along with your patch. > It would really be much better to submit anything that's not > absolutely necessary for the new feature as a separate patch, rather > than bundling things together. My plan is separate patches, one email. Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On Sat, Nov 19, 2016 at 2:54 PM, Tom Lane wrote: > Stephen Frost writes: >> Let's create a new data type for this which supports old and new types, >> add what casts make sense, and call it a day. Changing the data type >> name out from under people is not helping anyone. > > +1. I do not think changing behavior for the existing type name is > going to be a net win. If we'd been smart enough to make the type > varlena from the get-go, maybe we could get away with it, but there > is just way too much risk of trouble with a change in a fixed-length > type's on-the-wire representation. I completely agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Mon, Nov 21, 2016 at 11:29 AM, Alvaro Herrera wrote: >> When I used the word "cache" here, I was thinking more of this >> English language definition: >> >> a : a hiding place especially for concealing and preserving >>provisions or implements >> b : a secure place of storage >> >> The intent being to emphasize that there is not one public >> "registry" of such objects, but context-specific collections where >> references are tucked away when they become available for later use >> in the only the appropriate context. > How about "stash"? According to my reading of Merriam-Webster's > definition, "stash" mostly appears to be the thing that is stored > (hidden), rather than the place it's stored in, but one of the > definitions is "hiding place", and "cache" is listed as a synonym. "Stash" seems better that "cache" or "registry", especially since many programmers these days seem to associate "cache" with pass-through proxy techniques. I first became familiar with the term "cache" while reading Jack London, and tend to retain some association with the more general definition. Clearly I am in the minority on that here. http://ereimer.net/20080706/13586_erC720.htm -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
Robert Haas writes: > On Mon, Nov 21, 2016 at 1:00 PM, Tom Lane wrote: >> Hah: not where I thought it was at all. The problem seems to be down to >> the optimization I put into is_parallel_safe() awhile back to skip testing >> anything if we previously found the entire querytree to be parallel-safe. >> Well, the raw query tree *is* parallel-safe. It's only when we inject >> some Params that we have a parallel hazard. So that optimization is too >> optimistic :-( > That sucks. Any idea how we might salvage it? I just disabled it by checking to see if any Params have been created. It might be possible to be more refined, but I dunno that adding more bookkeeping would pay for itself. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Sat, Nov 19, 2016 at 8:51 AM, Karl O. Pinc wrote: > On Sat, 19 Nov 2016 12:58:47 +0100 > Gilles Darold wrote: > >> All patches you've submitted on tha v13 patch have been applied and >> are present in attached v14 of the patch. I have not included the >> patches about GUC changes because I'm not sure that adding a new file >> (include/utils/guc_values.h) just for that will be accepted or that it >> will not require a more global work to add other GUC values. However >> perhaps this patch can be submitted separately if the decision is not >> taken here. > > Understood. I've a couple of other patches that do > a little cleanup on master that I'd also like to submit > along with your patch. This on the theory that > the maintainers will be looking at this code anyway > because your patch touches it. All this can be submitted > for their review at once. My approach is to be minimally invasive on > a per-patch basis (i.e. your patch) but add small patches > that make existing code "better" without touching > functionality. (Deleting unnecessary statements, etc.) > The overall goal being a better code base. It would really be much better to submit anything that's not absolutely necessary for the new feature as a separate patch, rather than bundling things together. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
On Mon, Nov 21, 2016 at 1:00 PM, Tom Lane wrote: > I wrote: >> Actually, the Gather path *isn't* parameterized. The problem here is >> that we're planning an unflattened subquery, and the only thing that >> is parallel-unsafe is that there is an outer Param in its toplevel tlist, >> and we're somehow deciding that we can stick that unsafe tlist into (and >> beneath) the Gather node. So something rotten in that area, but I've not >> quite found it yet. > > Hah: not where I thought it was at all. The problem seems to be down to > the optimization I put into is_parallel_safe() awhile back to skip testing > anything if we previously found the entire querytree to be parallel-safe. > Well, the raw query tree *is* parallel-safe. It's only when we inject > some Params that we have a parallel hazard. So that optimization is too > optimistic :-( That sucks. Any idea how we might salvage it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fun fact about autovacuum and orphan temp tables
On Sun, Nov 20, 2016 at 10:42 PM, Michael Paquier wrote: > On Sat, Nov 19, 2016 at 2:16 PM, Michael Paquier > wrote: >> On Fri, Nov 18, 2016 at 1:11 PM, Robert Haas wrote: >>> That might sound adding unnecessary work just for the sake of >>> paranoia, but I don't think it is. Failures here won't be common, but >>> since they are entirely automated there will be no human intelligence >>> available to straighten things out. Barring considerable caution, >>> we'll just enter a flaming death spiral. >> >> Thinking more paranoid, an extra way to enter in this flaming death >> spiral is to not limit the maximum number of failures authorized when >> dropping a set of orphaned tables and transactions fail multiple >> times. This is basically not important as the relation on which the >> drop failed gets dropped from the list but failing on each one of them >> is a good way to slow down autovacuum, so putting a limit of say 10 >> transactions failing is I think really important. > > By the way, when hacking this patch I asked myself three questions: > 1) How many objects should be dropped per transaction? 50 sounds like > a fine number to me so the patch I sent is doing so. This should > definitely not be more than the default for max_locks_per_transaction, > aka 64. Another thing to consider would be to use a number depending > on max_locks_per_transaction, like half of it. > 2) How many transaction failures should autovacuum forgive? The patch > uses 10. Honestly I put that number out of thin air. > 3) Should the drop of orphaned tables be done for a wraparound > autovacuum? Obviously, the answer is no. It is vital not to consume > transaction XIDs in this case. The patch I sent is dropping the > objects even for a wraparound job, that's easily switchable. I don't think that you should skip it in the wraparound case, because it might be one of the temp tables that is threatening wraparound. I've committed the patch after doing some work on the comments. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Mon, Nov 21, 2016 at 2:15 PM, Masahiko Sawada wrote: > On Fri, Nov 18, 2016 at 6:54 AM, Claudio Freire > wrote: >> On Thu, Nov 17, 2016 at 6:34 PM, Robert Haas wrote: >>> On Thu, Nov 17, 2016 at 1:42 PM, Claudio Freire >>> wrote: Attached is patch 0002 with pgindent applied over it I don't think there's any other formatting issue, but feel free to point a finger to it if I missed any >>> >>> Hmm, I had imagined making all of the segments the same size rather >>> than having the size grow exponentially. The whole point of this is >>> to save memory, and even in the worst case you don't end up with that >>> many segments as long as you pick a reasonable base size (e.g. 64MB). >> >> Wastage is bound by a fraction of the total required RAM, that is, >> it's proportional to the amount of required RAM, not the amount >> allocated. So it should still be fine, and the exponential strategy >> should improve lookup performance considerably. > > I'm concerned that it could use repalloc for large memory area when > vacrelstats->dead_tuples.dead_tuple is bloated. It would be overhead > and slow. How large? That array cannot be very large. It contains pointers to exponentially-growing arrays, but the repalloc'd array itself is small: one struct per segment, each segment starts at 128MB and grows exponentially. In fact, IIRC, it can be proven that such a repalloc strategy has an amortized cost of O(log log n) per tuple. If it repallocd the whole tid array it would be O(log n), but since it handles only pointers to segments of exponentially growing tuples it becomes O(log log n). Furthermore, n there is still limited to MAX_INT, which means the cost per tuple is bound by O(log log 2^32) = 5. And that's an absolute worst case that's ignoring the 128MB constant factor which is indeed relevant. > What about using semi-fixed memroy space without repalloc; > Allocate the array of ItemPointerData array, and each ItemPointerData > array stores the dead tuple locations. The size of ItemPointerData > array starts with small size (e.g. 16MB or 32MB). After we used an > array up, we then allocate next segment with twice size as previous > segment. That's what the patch does. > But to prevent over allocating memory, it would be better to > set a limit of allocating size of ItemPointerData array to 512MB or > 1GB. There already is a limit to 1GB (the maximum amount palloc can allocate) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
I wrote: > Actually, the Gather path *isn't* parameterized. The problem here is > that we're planning an unflattened subquery, and the only thing that > is parallel-unsafe is that there is an outer Param in its toplevel tlist, > and we're somehow deciding that we can stick that unsafe tlist into (and > beneath) the Gather node. So something rotten in that area, but I've not > quite found it yet. Hah: not where I thought it was at all. The problem seems to be down to the optimization I put into is_parallel_safe() awhile back to skip testing anything if we previously found the entire querytree to be parallel-safe. Well, the raw query tree *is* parallel-safe. It's only when we inject some Params that we have a parallel hazard. So that optimization is too optimistic :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG
> I wanted to say that in order to use the connection pointed > by the DECLARE STATEMENT some functions like ECPGdo() would be > modified or > new function would be added under the directory ecpglib/. > > This modification or new function will be used to get the connection > by statement_name. Ah, now I understand. Thank you for your explanation. I'm looking forward to seeing your patch. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location
On Fri, Nov 18, 2016 at 11:09 PM, Peter Geoghegan wrote: > On Fri, Nov 18, 2016 at 5:27 PM, Claudio Freire > wrote: >> I've been changing the on-disk format considerably, to one that makes >> more sense. > > I can see how that would be necessary. I'm going to suggest some more > things that need a new btree version number now, too. > > I realized today, quite suddenly, why I disliked your original patch: > it didn't go far enough with embracing the idea of > heap-item-pointer-as-key. In other words, I didn't like that you > didn't change anything in the internal pages. But it did. In fact it only changed internal pages. > Maybe you should put > heap TIDs someplace in new internal pages, so that even there we treat > them as part of the key. That is indeed what's happening (both in the old and new version). The new version also opens up to the possibility of omitting the heap TID in inner pages, if they're redundant (ie: if two consecutive keys are different already, the heap TID part of the key is redundant, since it's not necessary to make tree descent decisions). > This may significantly benefit UPDATE-heavy > workloads where HOT doesn't work out so well. Particularly for > non-unique indexes, we currently have to wade through a lot of bloat > from the leaf level, rather than jumping straight to the correct leaf > page when updating or inserting. That is improved in the patch as well. The lookup for an insertion point for a heavily bloated (unique or not) index is done efficiently, instead of the O(N^2) way it used before. > You might not want to do the same with unique indexes, where we must > initially buffer lock "the first leaf page that a duplicate could be > on" while we potentially look in one or more additional sibling pages > (but bloat won't be so bad there, perhaps). It's done even for unique indexes. Locking in that case becomes complex, true, but I believe it's not an insurmountable problem. > Or, you might want to make > sure that B-Tree tuple insertions still find that "first page" to > check, despite still generally treating heap item pointer as part of > the key proper (in unique indexes, it can still behave like NULL, and > not participate in uniqueness checking, while still essentially being > part of the key/sort order). It behaves like that (even though it's not coded like that) > It also occurs to me that our performance for updates in the event of > many NULL values in indexes is probably really bad, and could be > helped by this. You'll want to test all of this with a workload that > isn't very sympathetic to HOT, of course -- most of these benefits are > seen when HOT doesn't help. I haven't really tested with an overabundance of NULLs, I'll add that to the tests. I have tested low-cardinality indexes though, but only for correctness. >> However, I haven't tested the current state of the patch thoroughly. >> >> If a WIP is fine, I can post the what I have, rebased. > > I'm mostly curious about the effects on bloat. I now feel like you > haven't sufficiently examined the potential benefits there, since you > never made heap item pointer a first class part of the key space. > Maybe you'd like to look into that yourself first? What do you mean with "first class part"? It's not included in the scankey for a variety of reasons, not the least of which is not breaking the interface for current uses, and because the tid type is quite limited in its capabilities ATM. Also, the heap TID is usually there on most AM calls that care about it (ie: insertions have it, of course), so adding it to the scankey also seemed redundant. If not adding to the scankey, what do you mean by "first class" then? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
Robert Haas writes: > On Mon, Nov 21, 2016 at 12:00 PM, Tom Lane wrote: >> It seems like maybe searching for individual Params is the wrong thing. >> Why are we allowing it to generate a parameterized Gather path at all? >> Given the lack of any way to transmit runtime param values to the worker, >> I can't see how that would ever work. > Hmm, so you're thinking it could be the job of generate_gather_paths() > to make sure we don't do that? Actually, the Gather path *isn't* parameterized. The problem here is that we're planning an unflattened subquery, and the only thing that is parallel-unsafe is that there is an outer Param in its toplevel tlist, and we're somehow deciding that we can stick that unsafe tlist into (and beneath) the Gather node. So something rotten in that area, but I've not quite found it yet. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
Kevin Grittner wrote: > On Mon, Nov 21, 2016 at 1:05 AM, Thomas Munro > wrote: > > Also, Tsrcache is strangely named: it's not exactly a cache, it's > > more of a registry. > > When I used the word "cache" here, I was thinking more of this > English language definition: > > a : a hiding place especially for concealing and preserving >provisions or implements > b : a secure place of storage > > The intent being to emphasize that there is not one public > "registry" of such objects, but context-specific collections where > references are tucked away when they become available for later use > in the only the appropriate context. Eventually, when these are > used for some of the less "eager" timings of materialized view > maintenance, they may be set aside for relatively extended periods > (i.e., minutes or maybe even hours) before being used. Neither > "registry" nor "cache" seems quite right; maybe someone can think > of a word with more accurate semantics. How about "stash"? According to my reading of Merriam-Webster's definition, "stash" mostly appears to be the thing that is stored (hidden), rather than the place it's stored in, but one of the definitions is "hiding place", and "cache" is listed as a synonym. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Mon, Nov 21, 2016 at 12:04 PM, Kevin Grittner wrote: >> Also, Tsrcache is strangely named: it's not exactly a cache, it's >> more of a registry. > > When I used the word "cache" here, I was thinking more of this > English language definition: > > a : a hiding place especially for concealing and preserving >provisions or implements > b : a secure place of storage > > The intent being to emphasize that there is not one public > "registry" of such objects, but context-specific collections where > references are tucked away when they become available for later use > in the only the appropriate context. Eventually, when these are > used for some of the less "eager" timings of materialized view > maintenance, they may be set aside for relatively extended periods > (i.e., minutes or maybe even hours) before being used. Neither > "registry" nor "cache" seems quite right; maybe someone can think > of a word with more accurate semantics. I complained about the use of "cache" in this name before, and I still think that it is off-base. I'm not saying there isn't some definition of the word that could cover what you're doing here, but it's not the definition that is going to pop to mind for people reading the code. I think "registry" would be OK; the fact that there is a registry does not mean it is a global registry; it can be a registry of ephemeral relations specific to that query. I'm sure there are other good choices, too, but, please, not cache! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Fri, Nov 18, 2016 at 6:54 AM, Claudio Freire wrote: > On Thu, Nov 17, 2016 at 6:34 PM, Robert Haas wrote: >> On Thu, Nov 17, 2016 at 1:42 PM, Claudio Freire >> wrote: >>> Attached is patch 0002 with pgindent applied over it >>> >>> I don't think there's any other formatting issue, but feel free to >>> point a finger to it if I missed any >> >> Hmm, I had imagined making all of the segments the same size rather >> than having the size grow exponentially. The whole point of this is >> to save memory, and even in the worst case you don't end up with that >> many segments as long as you pick a reasonable base size (e.g. 64MB). > > Wastage is bound by a fraction of the total required RAM, that is, > it's proportional to the amount of required RAM, not the amount > allocated. So it should still be fine, and the exponential strategy > should improve lookup performance considerably. I'm concerned that it could use repalloc for large memory area when vacrelstats->dead_tuples.dead_tuple is bloated. It would be overhead and slow. What about using semi-fixed memroy space without repalloc; Allocate the array of ItemPointerData array, and each ItemPointerData array stores the dead tuple locations. The size of ItemPointerData array starts with small size (e.g. 16MB or 32MB). After we used an array up, we then allocate next segment with twice size as previous segment. But to prevent over allocating memory, it would be better to set a limit of allocating size of ItemPointerData array to 512MB or 1GB. We could expand array of array using repalloc if needed, but it would not be happend much. Other design is similar to current your patch; the offset of the array of array and the offset of a ItemPointerData array control current location, which are cleared after finished reclaiming garbage on heap and index. And allocated array is re-used by subsequent scanning heap. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
On Mon, Nov 21, 2016 at 12:00 PM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Nov 21, 2016 at 11:43 AM, Tom Lane wrote: >>> so what we've got is a case where a parameter computed by the FunctionScan >>> (in the master) would need to be passed into the parallel workers at >>> runtime. Do we have code for that at all? If so where is it? > >> No, that's not supposed to happen. > > OK, that makes this a planner failure: we should not have allowed this > query to become parallelized. > >> Maybe it's checking the quals but not recursing into the tlist? > > It seems like maybe searching for individual Params is the wrong thing. > Why are we allowing it to generate a parameterized Gather path at all? > Given the lack of any way to transmit runtime param values to the worker, > I can't see how that would ever work. Hmm, so you're thinking it could be the job of generate_gather_paths() to make sure we don't do that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
Thanks for the review! Will respond further after reviewing your suggested patches; this is a quick response just to the contents of the email. On Mon, Nov 21, 2016 at 1:05 AM, Thomas Munro wrote: > Both ways have attracted criticism: the first involves touching > basically every core function that might eventually parse, plan or > execute a query to make it accept a Tsrcache and pass that on, and the > second involves a bunch of Rube Goldberg machine-like > callback/variable/parameter code. Just to quantify "touching basically every core function that might...", the number of functions which have a change in signature (adding one parameter) is seven. No more; no less. > I spent some time investigating whether a third way would be viable: > use ParamListInfo's setupParser hook and add an analogous one for the > executor, so that there is no registry equivalent to Tsrcache, but > also no dealing with param slots or (in plpgsql's case) new kinds of > variables. Instead, there would just be two callbacks: one for asking > the tuplestore provider for metadata (statistics, TupleDesc) at > planning time, and another for asking for the tuplestore by name at > execution time. One problem with this approach is that it requires > using the SPI_*_with_paramlist interfaces, not the more commonly used > arg-based versions, and requires using a ParamListInfo when you > otherwise have no reason to because you have no parameters. Also, > dealing with callbacks to register your tuplestore supplier is a > little clunky. More seriously, requiring providers of those callbacks > to write code that directly manipulates ParserState and EState and > calls addRangeTableEntryXXX seems like a modularity violation -- > especially for PLs that are less integrated with core Postgres code > than plpgsql. I got this more or less working, but didn't like it > much and didn't think it would pass muster. ok > After going through that experience, I now agree with Kevin: an > interface where a new SPI interface lets PLs push a named tuplestore > into the SPI connection to make it available to SQL seems like the > simplest and tidiest way. I do have some feedback and suggestions > though: > > 1. Naming: Using tuplestores in AfterTriggersData make perfect sense > to me but I don't think it follows that the exact thing we should > expose to the executor to allow these to be scanned is a > TuplestoreScan. We have other nodes that essentially scan a > tuplestore like WorkTableScan and Material but they have names that > tell you what they are for. This is for scanning data that has either > been conjured up or passed on by SPI client code and exposed to SQL > queries. How about SpiRelationScan, SpiDataScan, SpiRelVarScan, ? I think an SPI centered approach is the wrong way to go. I feel that an SPI *interface* will be very useful, and probably save thousands of lines of fragile code which would otherwise be blurring the lines of the layering, but I feel there there should be a lower-level interface, and the SPI interface should use that to provide a convenience layer. In particular, I suspect that some uses of these named tuplestore relations will initially use SPI for convenience of development, but may later move some of that code to dealing with parse trees, for performance. Ideally, the execution plan would be identical whether or not SPI was involved, so naming implying the involvement of SPI would be misleading. NamedTuplestoreScan might be an improvement over just TuplestoreScan. > Also, Tsrcache is strangely named: it's not exactly a cache, it's > more of a registry. When I used the word "cache" here, I was thinking more of this English language definition: a : a hiding place especially for concealing and preserving provisions or implements b : a secure place of storage The intent being to emphasize that there is not one public "registry" of such objects, but context-specific collections where references are tucked away when they become available for later use in the only the appropriate context. Eventually, when these are used for some of the less "eager" timings of materialized view maintenance, they may be set aside for relatively extended periods (i.e., minutes or maybe even hours) before being used. Neither "registry" nor "cache" seems quite right; maybe someone can think of a word with more accurate semantics. > 2. Scope of changes: If we're going to touch functions all over the > source tree to get the Tsrcache where it needs to be for parsing and > execution, then I wonder if we should consider thinking a bit more > about where this is going. What if we had a thing called a > QueryEnvironment, and we passed a pointer to that into to all those > places, and it could contain the named tuplestore registry? I agree. I had a version building on the Tsrcache approach which differentiated between three levels of generality: Ephemeral Relations, a subclass of that call Named Ephemeral Re
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
Robert Haas writes: > On Mon, Nov 21, 2016 at 11:43 AM, Tom Lane wrote: >> so what we've got is a case where a parameter computed by the FunctionScan >> (in the master) would need to be passed into the parallel workers at >> runtime. Do we have code for that at all? If so where is it? > No, that's not supposed to happen. OK, that makes this a planner failure: we should not have allowed this query to become parallelized. > Maybe it's checking the quals but not recursing into the tlist? It seems like maybe searching for individual Params is the wrong thing. Why are we allowing it to generate a parameterized Gather path at all? Given the lack of any way to transmit runtime param values to the worker, I can't see how that would ever work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
On Mon, Nov 21, 2016 at 11:43 AM, Tom Lane wrote: > I wrote: >> Like Ashutosh, I can't reproduce the crash, so it's hard to speculate much >> further. > > Ah-hah: now I can. The recipe lacks these important steps: > > set parallel_setup_cost TO 0; > set parallel_tuple_cost TO 0; > > That changes the plan to > > Limit (cost=0.00..0.06 rows=1 width=64) >-> Nested Loop (cost=0.00..57.25 rows=1000 width=64) > -> Function Scan on pg_show_all_settings a (cost=0.00..10.00 > rows=1000 width=64) > -> Limit (cost=0.00..0.03 rows=1 width=32) >-> Gather (cost=0.00..3.54 rows=130 width=32) > Workers Planned: 2 > -> Parallel Seq Scan on pg_opclass (cost=0.00..3.54 > rows=54 width=32) > > so what we've got is a case where a parameter computed by the FunctionScan > (in the master) would need to be passed into the parallel workers at > runtime. Do we have code for that at all? If so where is it? No, that's not supposed to happen. That's why we have this: /* * We can't pass Params to workers at the moment either, so they are also * parallel-restricted. */ else if (IsA(node, Param)) { if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) return true; } Maybe it's checking the quals but not recursing into the tlist? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
I wrote: > Like Ashutosh, I can't reproduce the crash, so it's hard to speculate much > further. Ah-hah: now I can. The recipe lacks these important steps: set parallel_setup_cost TO 0; set parallel_tuple_cost TO 0; That changes the plan to Limit (cost=0.00..0.06 rows=1 width=64) -> Nested Loop (cost=0.00..57.25 rows=1000 width=64) -> Function Scan on pg_show_all_settings a (cost=0.00..10.00 rows=1000 width=64) -> Limit (cost=0.00..0.03 rows=1 width=32) -> Gather (cost=0.00..3.54 rows=130 width=32) Workers Planned: 2 -> Parallel Seq Scan on pg_opclass (cost=0.00..3.54 rows=54 width=32) so what we've got is a case where a parameter computed by the FunctionScan (in the master) would need to be passed into the parallel workers at runtime. Do we have code for that at all? If so where is it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvements in psql hooks for variables
Stephen Frost wrote: > Are you working to make those changes? I'd rather we make the changes > to this code once rather than push what you have now only to turn around > and change it significantly again. So I went through the psql commands which don't fail on parse errors for booleans. I'd like to attract attention on \c, because I see several options. \c [-reuse-previous=BOOL] ...other args.. Current: if we can't parse BOOL, assume it's ON, and go on with reconnecting. Option1: if we can't parse BOOL, stop here, don't reconnect, set the command result as "failed", also ignoring the other arguments. Option2: maybe we want to create a difference between interactive and non-interactive use, as there's already one in handling the failure to connect through \c. The manpage says: If the connection attempt failed (wrong user name, access denied, etc.), the previous connection will only be kept if psql is in interactive mode. When executing a non-interactive script, processing will immediately stop with an error. Proposal: if interactive, same as Option1. If not interactive, -reuse-previous=bogus has the same outcome as a failure to connect. Which I think means two subcases: if it's through \i then kill the connection but don't quit psql if it's through -f then quit psql. Option3: leave this command unchanged, avoiding trouble. \timing BOOL Current: non-parseable BOOL interpreted as TRUE. Empty BOOL toggles the state. Proposal: on non-parseable BOOL, command failure and pset.timing is left unchanged. \pset [x | expanded | vertical ] BOOL \pset numericlocale BOOL \pset [tuples_only | t] BOOL \pset footer BOOL \t BOOL (falls into pset_do("tuples_only", ...)) \pset pager BOOL Current: non-parseable non-empty BOOL interpreted as TRUE. Empty BOOL toggles the on/off state. In some cases, BOOL interpretation is attempted only after specific built-in values have been ruled out first. Proposal: non-parseable BOOL implies command failure and unchanged state. About the empty string when a BOOL is expected. Only \c -reuse-previous seems concerned: #= \c -reuse-previous= acts the same as #= \c -reuse-previous=ON Proposal: handle empty as when the value is bogus. The other commands interpret this lack of value specifically to toggle the state, so it's a non-issue for them. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
Robert Haas writes: > Then again, that might just be a coincidence. The other thing that's > weird here is that the Datum being passed is apparently a NULL > pointer, which neither MakeExpandedObjectReadOnly() nor > MakeExpandedObjectReadOnlyInternal() are prepared to deal with. I > don't know whether it's wrong for a NULL pointer to occur here in the > first place or whether it's wrong for those functions not to be able > to deal with it if it does. The former. MakeExpandedObjectReadOnly does contain a null check, so something has passed it a zero Datum along with a claim that the Datum is not null. Like Ashutosh, I can't reproduce the crash, so it's hard to speculate much further. I am wondering if 13671b4b2 somehow fixed it, though I don't see how. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel execution and prepared statements
Am 18.11.2016 um 14:21 schrieb Albe Laurenz : > But then the change to use CURSOR_OPT_PARALLEL_OK in tcop/postgres.c should > be reverted as well, because it breaks things just as bad: > > /* creates a parallel-enabled plan */ > PQprepare(conn, "pstmt", "SELECT id FROM l WHERE val = $1", 1, NULL); > /* blows up with "cannot insert tuples during a parallel operation" */ > PQexec(conn, "CREATE TABLE bad(id) AS EXECUTE pstmt('abcd')"); Great example of mixing a v3 prepare with an simple query execute. I didn't think about that while even the docs state clearly: "Named prepared statements can also be created and accessed at the SQL command level, using PREPARE and EXECUTE." Sticking with either protocol version does not trigger the error. > I think we should either apply something like that patch or disable parallel > execution for prepared statements altogether and document that. So we likely need to backpatch something more then a doc-fix for 9.6. Given the patch proposals around, this would either disable a feature (in extended query protocol) or add a new one (in simple query protocol/SQL). Or would it be more appropriate to split the patch and use CURSOR_OPT_PARALLEL_OK in prepare.c on master only? I'm asking in case there is a necessity to prepare a proposal for an documentation patch targeting 9.6 specifically. Best regards Tobias -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] postgres 9.3 postgres_fdw ::LOG: could not receive data from client: Connection reset by peer
Hi, I have this question. Looked for a help on http://dba.stackexchange.com/ No success. Maybe you can answer?.. Thank you in advance "FOREIGN_TABLE" created with postgres_fdw. LOCAL_TABLE is just a local table... Symptoms: 1. I run in psql query SELECT * from FOREIGN_TABLE. No log generated 2. I run in bash psql -c "SELECT * from LOCAL_TABLE". No log generated 3. I run in bash psql -c "SELECT * from FOREIGN_TABLE". ::LOG: could not receive data from client: Connection reset by peer generated in postgres log I can't set logging lower and yet this message distracts. Please share any idea how to set up env to avoid generating this message?.. I feel I'm missing something obvious, but can't see what. PS. I tried running -f file instead of -c "SQL". Of course it did not change a thing. And of course I tried putting \q to file with same result...
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
Amit Kapila writes: > On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila wrote: >> Here instead of checking whether top_plan has initPlan, it should >> check whether initPlan is present anywhere in plan tree. I think one >> simple way could be to check *glob->subplans* instead of >> top_plan->initPlan, > Patch based on above suggestion is attached with this mail. I think this is the right fix for the moment, because the proximate cause of the crash is that ExecSerializePlan doesn't transmit any part of the PlannedStmt.subplans list, which glob->subplans is the preimage of. Now, maybe I'm missing something, but it seems to me that ordinary subplans could be transmitted to workers just fine. The problem with transmitting initplans is that you'd lose the expectation of single evaluation. (Even there, it might be okay if they're far enough down in the plan tree, but I haven't thought about it in detail.) So I'd rather see us fix ExecSerializePlan to transmit the subplans list and have some more-restrictive test here. This code would still be wrong as it stands though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
On Mon, Nov 21, 2016 at 6:10 PM, Amit Kapila wrote: > On Sun, Nov 20, 2016 at 10:43 PM, Andreas Seltenreich > wrote: >> Hi, >> >> the query below triggers a parallel worker assertion for me when run on >> the regression database of master as of 0832f2d. The plan sports a >> couple of InitPlan nodes below Gather. >> > > I think the reason of this issue is that in some cases where subplan > is at some node other than top_plan node, we allow them to be executed > in the worker in force_parallel_mode. It seems to me that the > problematic code is below check in standard_planner() > > if (force_parallel_mode != FORCE_PARALLEL_OFF && > best_path->parallel_safe && > top_plan->initPlan == NIL) > > Here instead of checking whether top_plan has initPlan, it should > check whether initPlan is present anywhere in plan tree. I think one > simple way could be to check *glob->subplans* instead of > top_plan->initPlan, > Patch based on above suggestion is attached with this mail. Thoughts? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com restrict_subplans_parallel_mode_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Parallel worker crash on seqscan
On Sun, Nov 20, 2016 at 8:53 AM, Andreas Seltenreich wrote: > the following query appears to reliably crash parallel workers on master > as of 0832f2d. > > --8<---cut here---start->8--- > set max_parallel_workers_per_gather to 2; > set force_parallel_mode to 1; > > select subq.context from pg_settings, > lateral (select context from pg_opclass limit 1) as subq > limit 1; > --8<---cut here---end--->8--- > > Backtrace of a worker below. Based on the backtrace, I wonder if this might be a regression introduced by Tom's recent commit 9a00f03e479c2d4911eed5b4bd1994315d409938, "Improve speed of aggregates that use array_append as transition function.", which adds additional cases where expanded datums can be used. In theory, a value passed from the leader to the workers ought to have gone through datumSerialize() which contains code to flatten expanded representations, but it's possible that code is broken, or maybe the problematic code path is something else altogether. I'm just suspicious about the fact that the failure is in MakeExpandedObjectReadOnlyInternal(). Then again, that might just be a coincidence. The other thing that's weird here is that the Datum being passed is apparently a NULL pointer, which neither MakeExpandedObjectReadOnly() nor MakeExpandedObjectReadOnlyInternal() are prepared to deal with. I don't know whether it's wrong for a NULL pointer to occur here in the first place or whether it's wrong for those functions not to be able to deal with it if it does. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Better handling of UPDATE multiple-assignment row expressions
While fooling around trying to document the behavior of *-expansion, I got annoyed about the fact that it doesn't work in the context of UPDATE tab SET (c1,c2,c3) = (x,y,z) ... The righthand side of this is a row-expression according to the SQL standard, and "foo.*" would be expanded in any other row expression, but we don't do that here. Another oddity is that this ought to be an abbreviated form of UPDATE tab SET (c1,c2,c3) = ROW(x,y,z) ... but we don't accept that. It seemed like fixing this might be a good lazy-Sunday project ... so here is a patch. The reason this case doesn't work like other row-expressions is that gram.y bursts the construct into UPDATE tab SET c1 = x, c2 = y, c3 = z ... before we ever do parse analysis. gram.y has no idea what "foo.*" might expand to, so there's no way that that approach can work if we want to fix this; the conversion has to be postponed to parse analysis. The reason we didn't do it like that originally is explained in gram.y: * Ideally, we'd accept any row-valued a_expr as RHS of a multiple_set_clause. * However, per SQL spec the row-constructor case must allow DEFAULT as a row * member, and it's pretty unclear how to do that (unless perhaps we allow * DEFAULT in any a_expr and let parse analysis sort it out later?). But it turns out that the idea suggested in the parenthetical comment is pretty easy to do. The attached patch allows DEFAULT in any a_expr and then makes transformExpr() throw an error for it if it wasn't handled by earlier processing. That allows improved error reporting: instead of getting "syntax error" when DEFAULT appears someplace you can't put it, you get a more specific error. For instance, this: regression=# UPDATE update_test SET (a,b) = (default, default+1); ERROR: syntax error at or near "+" LINE 1: UPDATE update_test SET (a,b) = (default, default+1); ^ changes to: regression=# UPDATE update_test SET (a,b) = (default, default+1); ERROR: DEFAULT is not allowed in this context LINE 1: UPDATE update_test SET (a,b) = (default, default+1); ^ As stated in the comment I just quoted, ideally we'd allow any suitable composite-valued a_expr on the RHS of this kind of SET clause. The attached patch doesn't really move the goal posts on that: you can still only use a RowExpr or a sub-SELECT. But the RowExpr is now processed in standard fashion, and we've at least gotten that restriction out of gram.y and pushed it down one more level. Anyway, the user-visible effects of this are: * You can now write ROW explicitly in this construct, which you should be able to per SQL spec. * Expansion of "foo.*" in the row-expression now works as expected (see regression test changes). * Some error conditions now give errors more specific than "syntax error", specifically misplacement of DEFAULT and attempting to use an unsupported kind of expression on the RHS of this kind of SET clause. I couldn't find anything that seemed worth changing in the existing docs for any of these points, although if we don't fix the second point, it'd require adjustments to the new docs I proposed at https://www.postgresql.org/message-id/16288.1479610...@sss.pgh.pa.us Comments, better ideas? Anyone want to do a formal review of this? regards, tom lane diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index 6901e08..36f8c54 100644 *** a/src/backend/parser/analyze.c --- b/src/backend/parser/analyze.c *** transformInsertStmt(ParseState *pstate, *** 644,651 { List *sublist = (List *) lfirst(lc); ! /* Do basic expression transformation (same as a ROW() expr) */ ! sublist = transformExpressionList(pstate, sublist, EXPR_KIND_VALUES); /* * All the sublists must be the same length, *after* --- 644,655 { List *sublist = (List *) lfirst(lc); ! /* ! * Do basic expression transformation (same as a ROW() expr, but ! * allow SetToDefault at top level) ! */ ! sublist = transformExpressionList(pstate, sublist, ! EXPR_KIND_VALUES, true); /* * All the sublists must be the same length, *after* *** transformInsertStmt(ParseState *pstate, *** 752,761 Assert(list_length(valuesLists) == 1); Assert(selectStmt->intoClause == NULL); ! /* Do basic expression transformation (same as a ROW() expr) */ exprList = transformExpressionList(pstate, (List *) linitial(valuesLists), ! EXPR_KIND_VALUES); /* Prepare row for assignment to target table */ exprList = transformInsertRow(pstate, exprList, --- 756,769 Assert(list_length(valuesLists) == 1); Assert(selectStmt->intoClause == NULL); ! /* ! * Do basic expression transformation (same as a ROW() expr, but allow ! * SetToDefault at top level) ! */ exp
Re: [HACKERS] condition variables
On Mon, Nov 21, 2016 at 7:10 AM, Thomas Munro wrote: >> I wonder if we shouldn't try to create the invariant that when the CV >> mutex is not help, the state of cvSleeping has to be true if we're in >> the proclist and false if we're not. So ConditionVariableSignal() >> would clear the flag before releasing the spinlock, for example. Then >> I think we wouldn't need proclist_contains(). > > Yeah, that'd work. ConditionVariableSleep would need to hold the > spinlock while checking cvSleeping (otherwise there is race case where > another process sets it to false but this process doesn't see that yet > and then waits for a latch-set which never arrives). It's not the end > of the world but it seems unfortunate to have to acquire and release > the spinlock in ConditionVariablePrepareToSleep and then immediately > again in ConditionVariableSleep. > > I was going to code that up but I read this and had another idea: > > http://stackoverflow.com/questions/8594591/why-does-pthread-cond-wait-have-spurious-wakeups > > I realise that you didn't actually say you wanted to guarantee no > spurious wakeups. But since the client code already has to check its > own exit condition, why bother adding a another layer of looping? > Sprurious latch sets must be super rare, but even if they aren't, what > do you save by filtering them here? In this situation you already got > woken from a deep slumbler and scheduled back onto the CPU, so it > hardly matters whether you go around again in that loop or the > client's loop. We could make things really simple instead: get rid of > cvSleeping, have ConditionVariablePrepareToSleep reset the latch, then > have ConditionVariableSleep wait for the latch to be set just once (no > loop). Then we'd document that spurious wakeups are possible so the > caller must write a robust predicate loop, exactly as you already > showed in your first message. We'd need to keep that > proclist_contains stuff to avoid corrupting the list. > proclist_contains would be the one source of truth for whether you're > in the waitlist, and the client code's predicate loop would contain > the one source of truth for whether the condition it is waiting for > has been reached. I don't think we can rely on spurious latch set events being rare. There are an increasing number of things that set the process latch, and there will very likely be more in the future. For instance, the arrival of tuples from a parallel worker associated with our session will set the process latch. Workers starting or dying will set the process latch. So my inclination was to try to guarantee no spurious wakeups at all, but possibly a softer guarantee that makes them unlikely would be sufficient. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel execution and prepared statements
> True, but we also try to avoid it whenever possible, because it's > likely to lead to poor performance. This non-readonly case should be way less often hit compared to other uses of prepared statements. But sure, it depends on the individual use case and a likely performance regession in these edge cases is nothing to decide for easily. > I think it would be a good idea to come up with a way for a query to > produce both a parallel and a non-parallel plan and pick between them > at execution time. However, that's more work than I've been willing > to undertake. Wouldn't the precautionary generation of two plans always increase the planning overhead, which precisely is what one want to reduce by using prepared statements? Best regards Tobias -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvements in psql hooks for variables
Daniel, * Daniel Verite (dan...@manitou-mail.org) wrote: > Stephen Frost wrote: > > > Are you working to make those changes? I'd rather we make the changes > > to this code once rather than push what you have now only to turn around > > and change it significantly again. > > If, as a maintainer, you prefer this together in one patch, I'm fine > with that. So I'll update it shortly with changes in \timing and > a few other callers of ParseVariableBool(). Did you get a chance to review and consider the other comments from my initial review about how we might use a different approach for bools, et al? Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Improvements in psql hooks for variables
Daniel, * Daniel Verite (dan...@manitou-mail.org) wrote: > "make check" seems OK with that, I hope it doesn't cause any regression > elsewhere. You can see what the code coverage of psql is in our current regression tests by going here: http://coverage.postgresql.org/src/bin/psql/index.html It's not exactly a pretty sight and certainly not all callers of ParseVariableBool() are covered. I'd strongly suggest you either do sufficient manual testing, or add regression tests, most likely using the tap test system (you can see an example of that in src/bin/pg_dump/t and in other 't' directories). You can generate that report after you make changes yourself using 'make coverage-html'. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Improvements in psql hooks for variables
Stephen Frost wrote: > Are you working to make those changes? I'd rather we make the changes > to this code once rather than push what you have now only to turn around > and change it significantly again. If, as a maintainer, you prefer this together in one patch, I'm fine with that. So I'll update it shortly with changes in \timing and a few other callers of ParseVariableBool(). Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvements in psql hooks for variables
Stephen Frost wrote: > Just fyi, there seems to be some issues with this patch because setting > my PROMPT1 and PROMPT2 variables result in rather odd behavior. Good catch! The issue is that the patch broke the assumption of prompt hooks that their argument points to a long-lived buffer. The attached v4 fixes the bug by passing to hooks a pointer to the final strdup'ed value in VariableSpace rather than temp space, as commented in SetVariable(). Also I've changed something else in ParseVariableBool(). The code assumes "false" when value==NULL, but when value is an empty string, the result was true and considered valid, due to the following test being positive when len==0 (both with HEAD or the v3 patch from this thread): if (pg_strncasecmp(value, "true", len) == 0) return true; It happens that "" as a value yields the same result than "true" for this test so it passes, but it seems rather unintentional. The resulting problem from the POV of the user is that we can do that, for instance: test=> \set AUTOCOMMIT without error message or feedback, and that leaves us without much clue about autocommit being enabled: test=> \echo :AUTOCOMMIT test=> So I've changed ParseVariableBool() in v4 to reject empty string as an invalid boolean (but not NULL since the startup logic requires NULL meaning false in the early initialization of these variables). "make check" seems OK with that, I hope it doesn't cause any regression elsewhere. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index a9a2fdb..61b2304 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -254,7 +254,7 @@ exec_command(const char *cmd, if (opt1 != NULL && strncmp(opt1, prefix, sizeof(prefix) - 1) == 0) { reuse_previous = - ParseVariableBool(opt1 + sizeof(prefix) - 1, prefix) ? + ParseVariableBool(opt1 + sizeof(prefix) - 1, prefix, NULL) ? TRI_YES : TRI_NO; free(opt1); @@ -1548,7 +1548,7 @@ exec_command(const char *cmd, OT_NORMAL, NULL, false); if (opt) - pset.timing = ParseVariableBool(opt, "\\timing"); + pset.timing = ParseVariableBool(opt, "\\timing", NULL); else pset.timing = !pset.timing; if (!pset.quiet) @@ -2660,7 +2660,7 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) if (value && pg_strcasecmp(value, "auto") == 0) popt->topt.expanded = 2; else if (value) - popt->topt.expanded = ParseVariableBool(value, param); + popt->topt.expanded = ParseVariableBool(value, param, NULL); else popt->topt.expanded = !popt->topt.expanded; } @@ -2669,7 +2669,7 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) else if (strcmp(param, "numericlocale") == 0) { if (value) - popt->topt.numericLocale = ParseVariableBool(value, param); + popt->topt.numericLocale = ParseVariableBool(value, param, NULL); else popt->topt.numericLocale = !popt->topt.numericLocale; } @@ -2724,7 +2724,7 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) else if (strcmp(param, "t") == 0 || strcmp(param, "tuples_only") == 0) { if (value) - popt->topt.tuples_only = ParseVariableBool(value, param); + popt->topt.tuples_only = ParseVariableBool(value, param, NULL); else popt->topt.tuples_only = !popt->topt.tuples_only; } @@ -2756,7 +2756,7 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) popt->topt.pager = 2; else if (value) { - if (ParseVariableBool(value, param)) + if (ParseVariableBool(value, param, NULL)) popt->topt.pager = 1; else popt->topt.pager = 0; @@ -2778,7 +2778,7 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) else if (strcmp(param, "footer") == 0) { if (value) - popt->topt.default_footer = ParseVariableBool(value, param); + popt->topt.default_footer = ParseVariableBool(value, param, NULL);
Re: [HACKERS] Improvements in psql hooks for variables
Daniel, * Daniel Verite (dan...@manitou-mail.org) wrote: > Tom Lane wrote: > > > Stephen Frost writes: > > > In reviewing this patch, I also noticed that it's set up to assume a > > > 'true' result when a variable can't be parsed by ParseVariableBool. > > > > I suppose that's meant to be backwards-compatible with the current > > behavior: > > > > regression=# \timing foo > > unrecognized value "foo" for "\timing"; assuming "on" > > Timing is on. > > Exactly. The scope of the patch is limited to the effect > of \set assignments to built-in variables. > > > but I agree that if we're changing things in this area, that would > > be high on my list of things to change. I think what we want going > > forward is to disallow setting "special" variables to invalid values, > > and that should hold both for regular variables that have special > > behaviors, and for the special-syntax cases like \timing. > > +1 Not sure I follow your reply here. There seems to be broad agreement to improve how we handle both \set and "special" variables and the code paths are related and this patch is touching them, so it seems like the correct next step here is to adjust the patch to update the code based on that agreement. Are you working to make those changes? I'd rather we make the changes to this code once rather than push what you have now only to turn around and change it significantly again. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Improvements in psql hooks for variables
Tom Lane wrote: > Stephen Frost writes: > > In reviewing this patch, I also noticed that it's set up to assume a > > 'true' result when a variable can't be parsed by ParseVariableBool. > > I suppose that's meant to be backwards-compatible with the current > behavior: > > regression=# \timing foo > unrecognized value "foo" for "\timing"; assuming "on" > Timing is on. Exactly. The scope of the patch is limited to the effect of \set assignments to built-in variables. > but I agree that if we're changing things in this area, that would > be high on my list of things to change. I think what we want going > forward is to disallow setting "special" variables to invalid values, > and that should hold both for regular variables that have special > behaviors, and for the special-syntax cases like \timing. +1 Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to change order sort of table in HashJoin
Thanks for reply, sir. On 11/21/2016 1:39 AM, Tom Lane wrote: Man writes: Additional information. In 9.6 the second table (lesser tuple) was choosen (the same testdata). There are something (cost estimation?) different in previous versions. I'd bet on different statistics in the two installations (either you forgot to ANALYZE, or the random sample came up quite a bit different). And I'm a little suspicious that these tests weren't all done with the same work_mem setting. I dumped the two tables in pg9.4 and restored to pg9.6, sir. I also set default_statistics_target to 1000 and ANALYZE d two tables in both installations. And so that were result. Anyway i know that order can not change by tuning parameters because it depend on storing data, thanks. regards, tom lane Thanks and best regards, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
> > Yeah, I modified the patch so, as I thought that would be consistent with > the aggregate pushdown patch. aggregate pushdown needs the tlist to judge the shippability of targetlist. For joins that's not required, so we should defer, if we can. > >>> Instead we should calculate tlist, the >>> first time we need it and then add it to the fpinfo. > > > Having said that, I agree on that point. I'd like to propose (1) adding a > new member to fpinfo, to store a list of output Vars from the subquery, and > (2) creating a tlist from it in deparseRangeTblRef, then, which would allow > us to calculate the tlist only when we need it. The member added to fpinfo > would be also useful to address the comments on the DML/UPDATE pushdown > patch. See the attached patch in [1]. I named the member a bit differently > in that patch, though. Again the list of Vars will be wasted if we don't choose that path for final planning. So, I don't see the point of adding list of Vars there. It looks like we are replacing one list with the other when none of those are useful, if the path doesn't get chosen for the final plan. If you think that the member is useful for DML/UDPATE pushdown, you may want to add it in the other patch. > > You modified the comments I added to deparseLockingClause into this: > > /* > +* Ignore relation if it appears in a lower subquery. Locking clause > +* for such a relation is included in the subquery. > +*/ > > I don't think the second sentence is 100% correct because a locking clause > isn't always required for such a relation, so I modified the sentence a bit. > I guess, "if required" part was implicit in construct "such a relation". Your version seems to make it explicit. I am fine with it. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Contains and is contained by operators of inet datatypes
On 11/17/2016 11:14 PM, Tom Lane wrote: The original post proposed that we'd eventually get some benefit by being able to repurpose << and >> to mean something else, but the time scale over which that could happen is so long as to make it unlikely to ever happen. I think we'd need to deprecate these names for several years, then actually remove them and have nothing there for a few years more, before we could safely install new operators that take the same arguments but do something different. (For comparison's sake, it took us five years to go from deprecating => as a user operator to starting to use it as parameter naming syntax ... and that was a case where conflicting use could be expected to throw an error, not silently misbehave, so we could force it with little risk of silently breaking peoples' applications. To repurpose << and >> in this way we would need to move much slower.) I agree. The value in re-purposing them is pretty low given the long time scales needed before that can be done. I'm inclined to think we should just reject this patch. I'm certainly not going to commit it without seeing positive votes from multiple people. Given that I reviewed it I think you already have my vote on this. I like the patch because it means less operators to remember for me as a PostgreSQL user. And at least for me inet is a rarely used type compared to hstore, json and range types which all use @> and <@. Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove the comment on the countereffectiveness of large shared_buffers on Windows
On Mon, Nov 21, 2016 at 7:46 AM, Tsunakawa, Takayuki wrote: > From: pgsql-hackers-ow...@postgresql.org >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila >> > shared_buffers tps >> > 256MB 990 >> > 512MB 813 >> > 1GB 1189 >> > 2GB 2258 >> > 4GB 5003 >> > 8GB 5062 >> > >> > "512MB is the largest effective size" seems to be a superstition, although >> I don't know the reason for the drop at 512MB. >> > Okay, not an issue. I think above data indicates that we can remove 512MB limit for Windows from docs. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in parallel worker in ExecInitSubPlan
On Sun, Nov 20, 2016 at 10:43 PM, Andreas Seltenreich wrote: > Hi, > > the query below triggers a parallel worker assertion for me when run on > the regression database of master as of 0832f2d. The plan sports a > couple of InitPlan nodes below Gather. > I think the reason of this issue is that in some cases where subplan is at some node other than top_plan node, we allow them to be executed in the worker in force_parallel_mode. It seems to me that the problematic code is below check in standard_planner() if (force_parallel_mode != FORCE_PARALLEL_OFF && best_path->parallel_safe && top_plan->initPlan == NIL) Here instead of checking whether top_plan has initPlan, it should check whether initPlan is present anywhere in plan tree. I think one simple way could be to check *glob->subplans* instead of top_plan->initPlan, another possibility is to traverse the whole tree to see if initPlan is present. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/11/19 0:16, Ashutosh Bapat wrote: Also another point I guess, this note doesn't add much value in the given context. Probably we should remove it. +* Note: the tlist would have one-to-one correspondence with the +* joining relation's reltarget->exprs because (1) the above test +* on PHVs guarantees that the reltarget->exprs doesn't contain +* any PHVs and (2) the joining relation's local_conds is NIL. +* This allows us to search the targetlist entry matching a given +* Var node from the tlist in get_subselect_alias_id. OK, I removed. On Fri, Nov 18, 2016 at 5:10 PM, Ashutosh Bapat wrote: I wrote: /* * For a join relation or an upper relation, use deparseExplicitTargetList. * Likewise, for a base relation that is being deparsed as a subquery, in * which case the caller would have passed tlist that is non-NIL, use that * function. Otherwise, use deparseTargetList. */ This looks correct. I have modified it to make it simple in the given patch. But, I think we shouldn't refer to a function e.g. deparseExplicitTargetlist() in the comment. Instead we should describe the intent e.g. "deparse SELECT clause from the given targetlist" or "deparse SELECT clause from attr_needed". My taste would be to refer to those functions, because ISTM that makes the explanation more simple and direct. So, I'd like to leave that for the committer's judge. I wrote: Done. I modified the patch as proposed; create the tlist by build_tlist_to_deparse in foreign_join_ok, if needed, and search the tlist by tlist_member. I also added a new member "tlist" to PgFdwRelationInfo to save the tlist created in foreign_join_ok. You wrote: Instead of adding a new member, you might want to reuse grouped_tlist by renaming it. Done. Right now, we are calculating tlist whether or not the ForeignPath emerges as the cheapest path. Yeah, I modified the patch so, as I thought that would be consistent with the aggregate pushdown patch. Instead we should calculate tlist, the first time we need it and then add it to the fpinfo. Having said that, I agree on that point. I'd like to propose (1) adding a new member to fpinfo, to store a list of output Vars from the subquery, and (2) creating a tlist from it in deparseRangeTblRef, then, which would allow us to calculate the tlist only when we need it. The member added to fpinfo would be also useful to address the comments on the DML/UPDATE pushdown patch. See the attached patch in [1]. I named the member a bit differently in that patch, though. You modified the comments I added to deparseLockingClause into this: /* +* Ignore relation if it appears in a lower subquery. Locking clause +* for such a relation is included in the subquery. +*/ I don't think the second sentence is 100% correct because a locking clause isn't always required for such a relation, so I modified the sentence a bit. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/38245b84-fabf-0899-1b24-8f94cdc5900c%40lab.ntt.co.jp *** a/contrib/postgres_fdw/deparse.c --- b/contrib/postgres_fdw/deparse.c *** *** 109,114 typedef struct deparse_expr_cxt --- 109,116 /* Handy macro to add relation name qualification */ #define ADD_REL_QUALIFIER(buf, varno) \ appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno)) + #define SS_TAB_ALIAS_PREFIX "s" + #define SS_COL_ALIAS_PREFIX "c" /* * Functions to determine whether an expression can be evaluated safely on *** *** 167,172 static void appendConditions(List *exprs, deparse_expr_cxt *context); --- 169,182 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *joinrel, bool use_alias, List **params_list); static void deparseFromExpr(List *quals, deparse_expr_cxt *context); + static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root, + RelOptInfo *foreignrel, bool make_subquery, + List **params_list); + static void appendSubselectAlias(StringInfo buf, int tabno, int ncols); + static void get_subselect_alias_id(Var *node, RelOptInfo *foreignrel, + int *tabno, int *colno); + static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, int *tabno, + int *colno); static void deparseAggref(Aggref *node, deparse_expr_cxt *context); static void appendGroupByClause(List *tlist, deparse_expr_cxt *context); static void appendAggOrderBy(List *orderList, List *targetList, *** *** 990,1001 deparseSelectSql(List *tlist, List **retrieved_attrs, deparse_expr_cxt *context) */ appendStringInfoString(buf, "SELECT "); if (foreignrel->reloptkind == RELOPT_JOINREL || ! foreignrel->reloptkind == RELOPT_UPPER_REL) ! { ! /* For a join relation use the input tlist */ deparseExplicitTargetList(tlist, retrieved_attrs, context); - } else {