Re: [HACKERS] delta relations in AFTER triggers
Alvaro Herrera alvhe...@2ndquadrant.com wrote: Kevin Grittner wrote: TRUNCATE triggers shouldn't have delta relations, that's for sure, or it'd completely negate the point of TRUNCATE triggers. I don't think we have any other event, do we? People who get delta relations for deleting all rows should be using DELETE, I think. That sounds reasonable. The only other issue was INSTEAD OF triggers, but I don't think we need them there, either. I am not sure about providing delta relations in the FOR EACH ROW case. For what cases are them useful? In an accounting application for courts I have seen a case where each receivable (asset) account had a contra (liability) account of uncollected receivables, because until and unless the Clerk of Courts collected the judgment there was no obligation to pay the money back out. Any financial transaction had to be in a database transaction, and not only did all the transaction detail need to balance to zero, but any receivable detail row needed to be immediately followed by a balancing contra account row (and vice versa). A FOR EACH ROW trigger, on seeing one of these rows, could check for the required balancing row. Now this would only be solved by the standard feature if both rows were always inserted by a single statement, which might not always have been the case; so even with this example I am stretching a bit. But it's close enough to suggest that there might be legitimate uses. And there is the fact that the standard seems to want this to be supported. In all cases I think NEW and OLD are sufficient. I didn't read the standard, but if it's saying that in FOR EACH ROW the new/deleted/changed row should be accessible by way of a delta relation, [...] No, it says that you can specify *both* the row variables for the row under consideration and the tables for the full set of rows affected by the statement *for the same FOR EACH ROW trigger*. Now, if you only have delta relations in FOR EACH STATEMENT, then it seems to me you can optimize obtaining them only when such triggers are defined; this lets you do away with the reloption entirely, doesn't it? That was intended to minimize the situations under which there was a performance hit where the new delta relations were not needed in an AFTER trigger. If we only generate these for FOR EACH STATEMENT triggers, that certainly reduces the need for that, but I'm not sure it eliminates it -- especially if we're generating tuplestores for the full rows rather than their TIDs. It is already generating the tuplestores only if there is an AFTER trigger for the type of statement being executed, but I agree that it would be a very rare FOR EACH ROW trigger that actually used it. Of course, that is one argument for the standard syntax -- we could test whether any of the trigger definitions for that operation on that table specified each transition table, and only generate them if needed based on that. I noticed that GetCurrentFDWTuplestore() changed its identity without getting its comment updated. The new name seems awfully generic, and I don't think it really describes what the function does. I think it needs more renaminguu Good point. Will fix that in the next version. (1) My first impulse was to capture this delta data in the form of tuplestores of just TIDs, and fetching the tuples themselves from the heap on reference. In earlier discussions others have argued for using tuplestores of the actual rows themselves. Can you please supply pointers to such discussion? That was in a matview-related discussion, so perhaps we should ignore that for now and discuss what's best for triggers here. If matviews need something different, the rows could always be materialized from the TIDs at a later point. I don't see any point in not just storing TIDs, but perhaps I'm missing something. I think there was some question about performance, but I really have a hard time seeing the performance being significantly worse with a TID tuplestore for most reasonable uses; and I think the TID tuplestore could be a lot faster if (for example) you have a table with a large, TOASTed text or bytea column which is not referenced (in selection criteria or the SET clause) and is not used in the FOR EACH STATEMENT trigger. I think there might also have been some question about visibility. A TID tuplestore would need to use a different snapshot (like maybe SnapshotAny) in the same query where it joined to other tables with a normal MVCC snapshot. (2) Do we want to just pick names for these in the PLs rather than using the standard syntax? Implementing the standard syntax seemed to require three new (unreserved) keywords, changes to the catalogs to store the chosen relations names, and some way to tie the specified relation names in to the various PLs. I think the only one for which we have a compulsion to follow someone in this area would be PL/pgSQL, ... which currently hard-codes the row
Re: [HACKERS] idle_in_transaction_timeout
On Wed, Jun 18, 2014 at 04:41:30PM -0400, Robert Haas wrote: On Wed, Jun 18, 2014 at 3:53 PM, Josh Berkus j...@agliodbs.com wrote: On 06/18/2014 12:32 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: There are plenty of badly-written applications which auto-begin, that is, they issue a BEGIN; immediately after every COMMIT; whether or not there's any additional work to do. This is a major source of IIT and the timeout should not ignore it. Nonsense. We explicitly don't do anything useful until the first actual command arrives, precisely to avoid that problem. Oh, we don't allocate a snapshot? If not, then no objection here. The only problem I see is that it makes the semantics kind of weird and confusing. Kill connections that are idle in transaction for too long is a pretty clear spec; kill connections that are idle in transaction except if they haven't executed any commands yet because we think you don't care about that case is not quite as clear, and not really what the GUC name says, and maybe not what everybody wants, and maybe masterminding. Kill connections that are idle in non-empty transaction block for too long -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
Robert Haas robertmh...@gmail.com wrote: Kevin Grittner kgri...@ymail.com wrote: Can you describe what the standard syntax for the row variables is, as opposed to our syntax? Also, what's the standard syntax for the the transition relations? If you want either (or both), the standard has you using a REFERENCING clause right before the FOR EACH part of the statement. You can specify one or more sections in the format: { OLD | NEW } [ ROW | TABLE ] [ AS ] name If you have more than one, they are separated by whitespace. (No commas or other visible delimiters.) If you omit ROW/TABLE it defaults to ROW. You are only allowed to specify TABLE on an AFTER trigger. You are only allowed to specify ROW on a FOR EACH ROW trigger. (There is no prohibition against specifying TABLE on a FOR EACH ROW trigger.) You are only allowed to specify OLD for a DELETE or UPDATE trigger. (The ability for one trigger definition to specify multiple operations is a PostgreSQL extension.) You are only allowed to specify NEW for an UPDATE or INSERT trigger. You may not repeat an entry. Essentially, we have an implied clause on every FOR EACH ROW trigger like: REFERENCING OLD ROW AS OLD NEW ROW AS NEW Some things which I *did* follow from the standard: these new relations are only allowed within AFTER triggers, but are available in both AFTER STATEMENT and AFTER ROW triggers. That is, an AFTER UPDATE ... FOR EACH ROW trigger could use both the OLD and NEW row variables as well as the delta relations (under whatever names we pick). That probably won't be used very often, but I can imagine some cases where it might be useful. I expect that these will normally be used in FOR EACH STATEMENT triggers. I'm concerned about the performance implications of capturing the delta relations unconditionally. If a particular trigger actually needs the delta relations, then the time spent capturing that information is well spent; but if it doesn't, then it's a waste. There are lots of people using FOR EACH STATEMENT triggers right now who won't be happy if those start requiring O(n) additional temporary storage, where n is the number of rows modified by the statement. This is likely an even bigger issue for per-row triggers, of course, where as you say, it probably won't be used often. That is why I added a reloption which must be specifically enabled for a table in order to generate these deltas. That would be an inconvenience for those wanting to use the new feature, but should prevent a performance regression for any tables where it is not specifically turned on. That's not perfect, of course, because if you turn it on for an UPDATE ... AFTER EACH STATEMENT trigger where you want it, you do suffer the overhead on every AFTER trigger on that table. Perhaps this is sufficient reason to use the standard syntax for the new delta tables -- the would then be generated only in the specific cases where they were needed. And I think I could lose the reloption. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] btreecheck extension
On Wed, Jun 18, 2014 at 4:48 AM, Stephen Frost sfr...@snowman.net wrote: I'm fine with having these start out as external tools which are doing checks, but I've been specifically asked about (and have desired myself from time-to-time...) an in-core capability to check index/heap/etc validity. Folks coming from certain other RDBMS's find it amazing that we don't have any support for that when what they really want is a background worker which is just automatically going around doing these checks. Yes, I think that being able to verify the integrity of index and heap relations is an important feature, and I think it's notable that we're the only major RDBMS that lacks this support. I'm not quite sure what that should entail just yet, but clearly it should be somewhat like what I have here. I think the big open questions to make something like this work are mostly around the cost/benefit ratio of each of the checks I've outlined. Certainly, for that use case minimizing disruption on a live system becomes even more important. I'll probably look at a buffer access strategy, just to give an example of that off the top of my head. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
David Fetter da...@fetter.org wrote: Robert Haas wrote: Kevin Grittner kgri...@ymail.com wrote: The good: - Generating the tuplestores. Yay! Thanks for that. ;-) The bad: - Generating them exactly and only for AFTER triggers The standard only allows them for AFTER triggers, and I'm not sure what the semantics would be for any others. - Requiring that the tuplestores both be generated or not at all. There are real use cases described below where only one would be relevant. Yeah. - Generating the tuplestores unconditionally. Well, there are conditions. Only when the reloption allows and only if there is an AFTER trigger for the type of operation in progress. The ugly: - Attaching tuplestore generation to tables rather than callers (triggers, DML, etc.) I'm not sure what you're getting at here. This patch is specifically only concerned with generating delta relations for DML AFTER triggers, although my hope is that it will be a basis for delta relations used for other purposes. This seems to me like the right place to initially capture the data for incremental maintenance of materialized views, and might be of value for other purposes, too. [formal definition of standard CREATE TRIGGER statement] Sorry that was a little verbose, but what it does do is give us what we need at trigger definition time. I'd say it's pilot error if a trigger definition says make these tuplestores and the trigger body then does nothing with them, which goes to Robert's point below re: unconditional overhead. Yeah, the more I think about it (and discuss it) the more I'm inclined to suffer the additional complexity of the standard syntax for specifying transition relations in order to avoid unnecessary overhead creating them when not needed. I'm also leaning toward just storing TIDs in the tuplestores, even though it requires mixed snapshots in executing queries in the triggers. It just seems like there will otherwise be to much overhead in copying around big, unreferenced columns for some situations. Along that same line, we don't always need to capture both the before tuplestores and the after ones. Two examples of this come to mind: - BEFORE STATEMENT triggers accessing rows, where there is no after part to use, Are you talking about an UPDATE for which the AFTER trigger(s) only reference the before transition table, and don't look at AFTER? If so, using the standard syntax would cover that just fine. If not, can you elaborate? and - DML (RETURNING BEFORE, e.g.) which only touches one of them. This applies both to extant use cases of RETURNING and to planned ones. I think that can be sorted out by a patch which implements that, if these deltas even turn out to be the appropriate way to get that data (which is not clear to me at this time). Assuming standard syntax, the first thing would be for the statement to somehow communicate to the trigger layer the need to capture a tuplestore it might otherwise not generate, and there would need to be a way for the statement to access the needed tuplestore(s). The statement would also need to project the right set of columns. None of that seems to me to be relevant to this patch. If this patch turns out to provide infrastructure that helps, great. If you have a specific suggestion about how to make the tuplestores more accessible to other layers, I'm listening. In summary, I'd like to propose that the tuplestores be generated separately in general and attached to callers. We can optimize this by not generating redundant tuplestores. Well, if we use the standard syntax for CREATE TRIGGER and store the transition table names (if any) in pg_trigger, the code can generate one relation if any AFTER triggers which are going to fire need it. I don't see any point in generating exactly the same tuplestore contents for each trigger. And suspect that we can wire in any other uses later when we have something to connect them to. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Jsonb: jbvBinary usage in the convertJsonbValue?
On 06/02/2014 11:38 AM, Andrew Dunstan wrote: On 06/02/2014 10:22 AM, Andrew Dunstan wrote: On 06/02/2014 10:02 AM, Robert Haas wrote: On Thu, May 29, 2014 at 7:34 AM, Dmitry Dolgov 9erthali...@gmail.com wrote: I'm little confused by the convertJsonbValue functon at jsonb_utils.c Maybe I misunderstood something, so I need help =) if (IsAJsonbScalar(val) || val-type == jbvBinary) convertJsonbScalar(buffer, header, val); As I can see, the convertJsonbScalar function is used for scalar and binary jsonb values. But this function doesn't handle the jbvBinary type. There definitely seems to be an oversight here of some kind. Either convertJsonbValue() shouldn't be calling convertJsonbScalar() with an object of type jbvBinary, or convertJsonbScalar() should know how to handle that case. Yes, I've just been looking at that. I think this is probably a hangover from when these routines were recast to some extent. Given that we're not seeing any errors from it, I'd be inclined to remove the the || val-type == jbvBinary part. One of the three call sites to convertJsonbValue asserts that this can't be true, and doing so doesn't result in a regression failure. Peter and Teodor, comments? Thinking about it a bit more, ISTM this should be ok, since we convert a JsonbValue to Jsonb in a depth-first recursive pattern. We should certainly add some comments along these lines to explain why the jbvbinary case shouldn't arise. Nobody has commented further that I have noticed, so I have committed this. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] idle_in_transaction_timeout
On 06/18/2014 02:52 PM, Bruce Momjian wrote: On Wed, Jun 18, 2014 at 04:41:30PM -0400, Robert Haas wrote: The only problem I see is that it makes the semantics kind of weird and confusing. Kill connections that are idle in transaction for too long is a pretty clear spec; kill connections that are idle in transaction except if they haven't executed any commands yet because we think you don't care about that case is not quite as clear, and not really what the GUC name says, and maybe not what everybody wants, and maybe masterminding. Kill connections that are idle in non-empty transaction block for too long Here's the POLS violation in this: I have idle_in_transaction_timeout set to 10min, but according to pg_stat_activity I have six connections which are IIT for over an hour. What gives? Robert's right, not killing the BEGIN; only transactions is liable to result in user confusion unless we label those sessions differently in pg_stat_activity. Tom is right in that killing them will cause some users to not use IIT_timeout when they should, or will set the timeout too high to be useful. So it seems like what we should do is NOT call sessions IIT if they've merely executed a BEGIN; and not done anything else. Then the timeout and pg_stat_activity would be consistent. Counter-argument: most app frameworks which do an automatic BEGIN; also do other stuff like SET TIMEZONE each time as well. Is this really a case worth worrying about? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] idle_in_transaction_timeout
On 2014-06-19 1:46 AM, Josh Berkus wrote: Robert's right, not killing the BEGIN; only transactions is liable to result in user confusion unless we label those sessions differently in pg_stat_activity. Wouldn't they be labeled differently already? i.e. the last query would be BEGIN. Unless your app tries to unsuccessfully use nested transactions, you would know why it hasn't been killed. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] idle_in_transaction_timeout
On 06/18/2014 04:54 PM, Marko Tiikkaja wrote: On 2014-06-19 1:46 AM, Josh Berkus wrote: Robert's right, not killing the BEGIN; only transactions is liable to result in user confusion unless we label those sessions differently in pg_stat_activity. Wouldn't they be labeled differently already? i.e. the last query would be BEGIN. Unless your app tries to unsuccessfully use nested transactions, you would know why it hasn't been killed. That's pretty darned obscure for a casual user. *you* would know, and *I* would know, but 99.5% of our users would be very confused. Plus, if a session which has only issued a BEGIN; doesn't have a snapshot and isn't holding any locks, then I'd argue we shouldn't lable it IIT in the first place because it's not doing any of the bad stuff we want to resolve by killing IITs. Effectively, it's just idle. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels
Gurjeet Singh gurj...@singh.im writes: On Tue, Jun 10, 2014 at 12:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: If we're going to do this, I would say that we should also take the opportunity to get out from under the question of which kernel API we're dealing with. So perhaps a design like this: 1. If the environment variable PG_OOM_ADJUST_FILE is defined, it's the name of a file to write something into after forking. The typical value would be /proc/self/oom_score_adj or /proc/self/oom_adj. If not set, nothing happens. 2. If the environment variable PG_OOM_ADJUST_VALUE is defined, that's the string we write, otherwise we write 0. Please find attached the patch. It includes the doc changes as well. Applied with some editorialization. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] idle_in_transaction_timeout
Josh Berkus j...@agliodbs.com writes: Counter-argument: most app frameworks which do an automatic BEGIN; also do other stuff like SET TIMEZONE each time as well. Is this really a case worth worrying about? SET TIMEZONE doesn't result in taking a snapshot either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] idle_in_transaction_timeout
On Wed, Jun 18, 2014 at 8:01 PM, Josh Berkus [via PostgreSQL] ml-node+s1045698n5807868...@n5.nabble.com wrote: On 06/18/2014 04:54 PM, Marko Tiikkaja wrote: On 2014-06-19 1:46 AM, Josh Berkus wrote: Robert's right, not killing the BEGIN; only transactions is liable to result in user confusion unless we label those sessions differently in pg_stat_activity. Wouldn't they be labeled differently already? i.e. the last query would be BEGIN. Unless your app tries to unsuccessfully use nested transactions, you would know why it hasn't been killed. That's pretty darned obscure for a casual user. *you* would know, and *I* would know, but 99.5% of our users would be very confused. Plus, if a session which has only issued a BEGIN; doesn't have a snapshot and isn't holding any locks, then I'd argue we shouldn't lable it IIT in the first place because it's not doing any of the bad stuff we want to resolve by killing IITs. Effectively, it's just idle. +1 Since the BEGIN is not meaningfully interpreted until the first subsequent command (SET excepting apparently - are there others?) is issued no transaction has begun at this point so idle and not IIT should be the reported state on pg_stat_activity. Though I can understand some level of surprise if someone sees idle with a BEGIN (or SET TIMEZONE) as the last executed command - so maybe idle before transaction instead of idle in transaction - which hopefully will not be assumed to be controlled by the idle_in_transaction_timeout GUC. I presume that we have some way to distinguish this particular case and can report it accordingly. Even if that state endures after a SET TIMEZONE or similar session configuration command explaining that all those are pre-transaction shouldn't be too hard - though as a third option idle initialized transaction could be the state indicator. Depending on how idle in transaction (aborted) gets resolved we could also consider idle in transaction (initializing) - but since the former, IMO, should be dropped (and thus matches the in specification of the GUC) naming the later - which should not be dropped (I'll go with the stated opinion here for now) - with in is not advisable. The current behavior is transparent to the casual user so sticking with idle does seem like the best choice to maintain the undocumented nature of the behavior. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/idle-in-transaction-timeout-tp5805859p5807874.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] How to change the pgsql source code and build it??
On 06/19/2014 01:50 AM, Shreesha wrote: However in my case, I don't know why there wasn't a default database with name 'root' got created or why the server rejects the client when it tries to connect to the default database. Take a look at the initdb manual, the tutorial, and the installation/system administration documentation. Can anyone shed some light on 1) when the default database gets created By initdb - see the initdb manual. 2) how is this database 'name' is decided? Or what is the name of the default database name? See the initdb manual. 3) Is there any other places in the database server code where this check is applied? Wherever you connect, libpq picks the default database = the current username. Upon looking at the error I got, I believe the code is searching for the database name == user name. If anyone can give some input on the code, it would be helpful! libpq I think you probably need to move this to pgsql-general, Stack Overflow, etc. It's not really on topic for pgsql-hackers. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?
I was going to answer each message, but Fabrízio summarized everything so far really nicely. Thanks Fabrízio. On Wed, Jun 18, 2014 at 5:25 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Then, to summarize Matheus must do: * use an option instead of change the syntax and catalog to indicate that a tablespace is used to store temp objects Yes. I myself wasn't sure TEMPORARY syntax would be good, but I would just like to hear more about. Storing as an options seems nice to me. * throw an exception if we try ALTER the option only_temp_relations I think we should postpone the ALTER option, perhaps as another patch. Wasn't that what was decided? I'll investigate the options better before going this route. Let me work on CREATE first. * add regression tests * add documentation And, of course, register to the next open commitfest [1] to get detailed feedback and review. Noted. It will be on the next commitfest. Just knowing some people do want this make me more comfortable on working better. Best regards, -- Matheus de Oliveira Analista de Banco de Dados Dextra Sistemas - MPS.Br nível F! www.dextra.com.br/postgres
[HACKERS] Possible index issue on 9.5 slave
Hi I've just run into an index issue on 9.5 HEAD on a slave (master and slave both compiled from 66802246e22d51858cd543877fcfddf24e6812f2); details below (I have only found one index on the slave where the issue occurs so far). The setup is admittedly slightly unusual; master is OS X 10.7.5, slave is CentOS on a Virtualbox guest VM on the same system. The issue only occurs with this combination of master and slave; I haven't been able to reproduce it with master and slave running natively on OS X, or with a Linux guest VM on a Linux machine. I have reproduced it several times on the OS X/Linux guest VM combination. I can't dig any further into this at the moment but can happily provide further details etc. Master == $ uname -a Darwin nara.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 tgg_current= SELECT version(); version -- PostgreSQL 9.5devel on x86_64-apple-darwin11.4.2, compiled by gcc (MacPorts gcc48 4.8.2_2) 4.8.2, 64-bit (1 row) tgg_current= select user_id, login from tgg_user where login ='admin'; user_id | login -+--- 1 | admin (1 row) Slave = $ uname -a Linux localhost.localdomain 2.6.32-358.el6.x86_64 #1 SMP Fri Feb 22 00:31:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux tgg_current= select version(); version - PostgreSQL 9.5devel on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit (1 row) tgg_current= select user_id,login from tgg_user where login ='admin'; user_id | login -+--- (0 rows) tgg_current= explain select user_id,login from tgg_user where login ='admin'; QUERY PLAN Index Scan using tgg_user_login_key on tgg_user (cost=0.28..8.30 rows=1 width=15) Index Cond: ((login)::text = 'admin'::text) Planning time: 0.105 ms (3 rows) tgg_current= set enable_bitmapscan=off; SET tgg_current= set enable_indexscan =off; SET tgg_current= select user_id,login from tgg_user where login ='admin'; user_id | login -+--- 1 | admin (1 row) tgg_current= \d tgg_user_login_key Index epp.tgg_user_login_key Column | Type | Definition +---+ login | character varying(32) | login unique, btree, for table epp.tgg_user Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
I wrote: I do see growth in the per-query context as well. I'm not sure where that's coming from, but we probably should try to find out. A couple hundred bytes per iteration is going to add up, even if it's not as fast as 8K per iteration. I'm not sure it's dblink's fault, because I don't see anything in dblink.c that is allocating anything in the per-query context, except for the returned tuplestores, which somebody else should clean up. I poked at this example some more, and found that the additional memory leak is occurring during evaluation of the arguments to be passed to dblink(). There's been a comment there for a very long time suggesting that we might need to do something about that ... With the attached patch on top of yours, I see no leak anymore. regards, tom lane diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c index f162e92..cf8cbb1 100644 *** a/src/backend/executor/execQual.c --- b/src/backend/executor/execQual.c *** ExecMakeTableFunctionResult(ExprState *f *** 2014,2019 --- 2014,2020 PgStat_FunctionCallUsage fcusage; ReturnSetInfo rsinfo; HeapTupleData tmptup; + MemoryContext argContext = NULL; MemoryContext callerContext; MemoryContext oldcontext; bool direct_function_call; *** ExecMakeTableFunctionResult(ExprState *f *** 2083,2094 /* * Evaluate the function's argument list. * ! * Note: ideally, we'd do this in the per-tuple context, but then the ! * argument values would disappear when we reset the context in the ! * inner loop. So do it in caller context. Perhaps we should make a ! * separate context just to hold the evaluated arguments? */ argDone = ExecEvalFuncArgs(fcinfo, fcache-args, econtext); /* We don't allow sets in the arguments of the table function */ if (argDone != ExprSingleResult) ereport(ERROR, --- 2084,2104 /* * Evaluate the function's argument list. * ! * We can't do this in the per-tuple context: the argument values ! * would disappear when we reset that context in the inner loop. And ! * the caller's context is typically a query-lifespan context, so we ! * don't want to leak memory there. So make a separate context just ! * to hold the evaluated arguments. */ + argContext = AllocSetContextCreate(callerContext, + Table function arguments, + ALLOCSET_DEFAULT_MINSIZE, + ALLOCSET_DEFAULT_INITSIZE, + ALLOCSET_DEFAULT_MAXSIZE); + oldcontext = MemoryContextSwitchTo(argContext); argDone = ExecEvalFuncArgs(fcinfo, fcache-args, econtext); + MemoryContextSwitchTo(oldcontext); + /* We don't allow sets in the arguments of the table function */ if (argDone != ExprSingleResult) ereport(ERROR, *** no_function_result: *** 2321,2328 --- 2331,2342 FreeTupleDesc(rsinfo.setDesc); } + /* Clean up */ MemoryContextSwitchTo(callerContext); + if (argContext) + MemoryContextDelete(argContext); + /* All done, pass back the tuplestore */ return rsinfo.setResult; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Wed, Jun 18, 2014 at 03:30:34PM -0700, Kevin Grittner wrote: David Fetter da...@fetter.org wrote: Robert Haas wrote: Kevin Grittner kgri...@ymail.com wrote: The good: - Generating the tuplestores. Yay! Thanks for that. ;-) Sorry, I just can't resist references to Spaghetti Westerns. https://en.wikipedia.org/wiki/The_Good,_the_Bad_and_the_Ugly The bad: - Generating them exactly and only for AFTER triggers The standard only allows them for AFTER triggers, and I'm not sure what the semantics would be for any others. As, so here's where we differ. You're looking at deltas, a very nice capability to have. I'm looking at the before and after tuplestores as components of which deltas, among many other things, could be composed. - Requiring that the tuplestores both be generated or not at all. There are real use cases described below where only one would be relevant. Yeah. - Generating the tuplestores unconditionally. Well, there are conditions. Only when the reloption allows and only if there is an AFTER trigger for the type of operation in progress. For deltas, this is just the thing. I'm vaguely picturing the following as infrastructure: - Instead of modifying Rel, we modify Query to contain two more bools default false: hasBeforeTuplestore and hasAfterTuplestore - Each use case we implement would set 0 or more of these to true. For the delta use case, appropriate trigger definitions would set both. This is vague because I haven't really gotten hacking on it, just exploring what I hope are the relevant parts of the code. The ugly: - Attaching tuplestore generation to tables rather than callers (triggers, DML, etc.) I'm not sure what you're getting at here. This patch is specifically only concerned with generating delta relations for DML AFTER triggers, although my hope is that it will be a basis for delta relations used for other purposes. This seems to me like the right place to initially capture the data for incremental maintenance of materialized views, and might be of value for other purposes, too. Hrm. I don't really see this stuff as table properties. The materialized view case is an obvious example where the matview, not the relations underneath, wants this information. The relations underneath may have their own concerns, but it's the matview whose existence should ensure that the tuplestores are being generated. Once the last depending-on-one-of-the-tuplestores things is gone, and this could simply be the end of a RETURNING query, the tuplestores go away. [formal definition of standard CREATE TRIGGER statement] Sorry that was a little verbose, but what it does do is give us what we need at trigger definition time. I'd say it's pilot error if a trigger definition says make these tuplestores and the trigger body then does nothing with them, which goes to Robert's point below re: unconditional overhead. Yeah, the more I think about it (and discuss it) the more I'm inclined to suffer the additional complexity of the standard syntax for specifying transition relations in order to avoid unnecessary overhead creating them when not needed. I'm also leaning toward just storing TIDs in the tuplestores, even though it requires mixed snapshots in executing queries in the triggers. So in this case one tuplestore with two TIDs, either of which might be NULL? just seems like there will otherwise be to much overhead in copying around big, unreferenced columns for some situations. Yeah, it'd be nice to have the minimal part be as slim as possible. Along that same line, we don't always need to capture both the before tuplestores and the after ones. Two examples of this come to mind: - BEFORE STATEMENT triggers accessing rows, where there is no after part to use, Are you talking about an UPDATE for which the AFTER trigger(s) only reference the before transition table, and don't look at AFTER?If so, using the standard syntax would cover that just fine. If not, can you elaborate? Sorry I was unclear. I was looking at one of the many things having these tuplestores around could enable. As things stand now, there is no access of any kind to rows with any per-statement trigger, modulo user-space hacks like this one: http://people.planetpostgresql.org/dfetter/index.php?/archives/71-Querying-Rows-in-Statement-Triggers.html Having the before tuplestore available to a BEFORE STATEMENT trigger would make it possible to do things with the before transition table that are fragile and hacky now. and - DML (RETURNING BEFORE, e.g.) which only touches one of them. This applies both to extant use cases of RETURNING and to planned ones. I think that can be sorted out by a patch which implements that, if these deltas even turn out to be the appropriate way to get that data (which is not clear to me at this time). Again, I see the tuplestores as
Re: [HACKERS] Possible index issue on 9.5 slave
On Wed, Jun 18, 2014 at 6:54 PM, Ian Barwick i...@2ndquadrant.com wrote: I've just run into an index issue on 9.5 HEAD on a slave (master and slave both compiled from 66802246e22d51858cd543877fcfddf24e6812f2); details below (I have only found one index on the slave where the issue occurs so far). Would you mind running my btreecheck tool on both systems? That might shed some light on this. You can get it from: http://www.postgresql.org/message-id/cam3swzrtv+xmrwlwq6c-x7czvwavfdwfi4st1zz4ddgfh4y...@mail.gmail.com . I suggest running bt_parent_index_verify() and bt_leftright_verify() on all indexes on both systems. It shouldn't take too long. Thanks -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/18/2014 07:29 PM, Tom Lane wrote: I wrote: I do see growth in the per-query context as well. I'm not sure where that's coming from, but we probably should try to find out. A couple hundred bytes per iteration is going to add up, even if it's not as fast as 8K per iteration. I'm not sure it's dblink's fault, because I don't see anything in dblink.c that is allocating anything in the per-query context, except for the returned tuplestores, which somebody else should clean up. I poked at this example some more, and found that the additional memory leak is occurring during evaluation of the arguments to be passed to dblink(). There's been a comment there for a very long time suggesting that we might need to do something about that ... With the attached patch on top of yours, I see no leak anymore. I can confirm that -- rock solid with 1 million iterations. I assume that should not be back-patched though? On a side note, while perusing this section of code: 8-- at dblink.c:1176 -- /* make sure we have a persistent copy of the tupdesc */ tupdesc = CreateTupleDescCopy(tupdesc); /* check result and tuple descriptor have the same number of columns */ if (nfields != tupdesc-natts) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg(remote query result rowtype does not match the specified FROM clause rowtype))); /* Prepare attinmeta for later data conversions */ sinfo-attinmeta = TupleDescGetAttInMetadata(tupdesc); /* Create a new, empty tuplestore */ oldcontext = MemoryContextSwitchTo(rsinfo-econtext-ecxt_per_query_memory); sinfo-tuplestore = tuplestore_begin_heap(true, false, work_mem); rsinfo-setResult = sinfo-tuplestore; rsinfo-setDesc = tupdesc; MemoryContextSwitchTo(oldcontext); 8-- at dblink.c:1194 -- Shouldn't that CreateTupleDescCopy() happen in ecxt_per_query_memory? Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTolNHAAoJEDfy90M199hlEN8QAIi3eq5D93Y4CqWWS8Uulqx1 jOBQUoLTwF7/CkYM0F+tGvX1QO1BAIDAxS6jwWXK/c9NqesVuS5tNwQcdrLb69Pm me4hQ2ZYtoCA4Y8SiFL0sOjUGuads2QEjhL3HUMQDBXTMjCpzamotIBGvYWu1OOe bf1VaY83bwGa6XvXYcfFFyqyUz+kMHvcM/Rq9Mj7pLrGT9Lqvec9RL/1FhFYamrI 2ewaKYC4htWXwk1xIvpDZtWTjLyv3BUl3X41BzPOecChWqXmYCpaPo+dR6V9LSm1 OxGFQL4Z3pM7VZDWk5FOj3vOFz1AJhWXEh8fyzlLKeDm7FFaApyf3rPLULBRO4sj 4C88lUdSbhzgTlMreq/wqBh2bKFmLGUA8M9sSwd+2DMc6QQQN0DWCgDtSZq/3Fwr Tc/0LWvHwh+/Tozx0kk0DxIQzS0BOsWHzjtOMwb1p3aLZXlG9FP5IrrPJlIyDOQK feVB9JNH5kGUFEU0OkGaqPaQy1lTVTizIA/FmTqV9QeJo2+vKSNPt2Ys4dM5jevo Tpp6ZAjvC6sAoIoWmvtazV5WOL7FwXwf8AQRJRgmh8JqHw4C2nZt9R+CNQqbGPa2 hxiTWi5EMufBmVOJeaEX867CUTLL6qzCtr/I2a2XZyMuPD5dQbS3l2MaYw1l2ucU o9x6G78hBR32xagpqPCE =LqzA -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible index issue on 9.5 slave
On 19/06/14 11:58, Peter Geoghegan wrote: On Wed, Jun 18, 2014 at 6:54 PM, Ian Barwick i...@2ndquadrant.com wrote: I've just run into an index issue on 9.5 HEAD on a slave (master and slave both compiled from 66802246e22d51858cd543877fcfddf24e6812f2); details below (I have only found one index on the slave where the issue occurs so far). Would you mind running my btreecheck tool on both systems? That might shed some light on this. You can get it from: http://www.postgresql.org/message-id/cam3swzrtv+xmrwlwq6c-x7czvwavfdwfi4st1zz4ddgfh4y...@mail.gmail.com . I suggest running bt_parent_index_verify() and bt_leftright_verify() on all indexes on both systems. It shouldn't take too long. Interesting, I'll take a look later. Thanks Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
Joe Conway m...@joeconway.com writes: On a side note, while perusing this section of code: 8-- at dblink.c:1176 -- /* make sure we have a persistent copy of the tupdesc */ tupdesc = CreateTupleDescCopy(tupdesc); Shouldn't that CreateTupleDescCopy() happen in ecxt_per_query_memory? Not necessary (we'd have seen crashes long since if it was). ExecMakeTableFunctionResult doesn't need the tupdesc to persist past return. Actually, I was wondering whether we couldn't remove that CreateTupleDescCopy call entirely. The risk would be if get_call_result_type returned a pointer into relcache or some other cached tuple descriptor, which might be subject to a cache flush --- but AFAICS it always returns a freshly created or copied tupdesc. (This might not have been true originally, which could explain why dblink thinks it needs to copy.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible index issue on 9.5 slave
On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick i...@2ndquadrant.com wrote: Interesting, I'll take a look later. I'm pretty suspicious of incompatibilities that may exist between the two sets of OS collations involved here. We aren't very clear on the extent to which what you're doing is supported, but it's certainly the case that bttextcmp()/varstr_cmp()/strcoll() return values must be immutable between the two systems. Still, it should be possible to determine if that's the problem using btreecheck. Do you get perfectly consistent answers between the two when you ORDER BY login? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/18/2014 08:19 PM, Tom Lane wrote: Actually, I was wondering whether we couldn't remove that CreateTupleDescCopy call entirely. Apparently not, at least without some additional surgery. ExecMakeTableFunctionResult() tries to free the tupledesc and segfaults. Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTolmPAAoJEDfy90M199hlz7YQAInPRKkGLskcqx4ujmsNpbEG RS2zSll4PqCELa/wMcWDJUsoVkzjybuw6A/1MSLGtERIamHDcmDIbTFwx9Z02n0u nuFGgyds9auIqn+AB18rvwgas+gL6tRHOUe4bagiuqNzywnOceW98PT0NttWt86y Fsyz/xfapIoV+kS1k9FAXC5B/PfayYoPq3cB3qWGNX/vor+xIgw3BGT9Bk3DbN2J IqD75HqoFV5jEwiStwxLaLvgqiLPVMzI/HdiQhprbveTa1e2vFM7Tu6n02i8uFVt fVu4UCtBtOWRIC9oOO0QhVtqnETMpwsxwWIxC5RhWScL7M8CT3Z9cw5zCIJWo2Q8 VaUDa+gpXukisg8OUfK2AhSduxcPYJ8I+b/VMS9p6j5P/87slnLuMaTnxU7afVCM 3F2UrDOgv53t43NMeD3xou8J4ntf/NJbaOsXCQx9yXjcq3UMzT0g3OSwxPbfViE+ YN6GCelnt6e0mT3Uk/pZDm0+QwYeMv6ZHjYD3y7vD4+Ipnt/HNAhO6r/HyRDk7/x DvSeO0okJXwUqTq/xcJ6wBE7frBqTpfzLxEMLXEVMMRCZWpKOAwK05afIZsadKqP wgeJETiSirKfWUWb0/bmMIVv2vA4fRZYpLW31pBr6OLS1GlwsrsfuYNxCm8ur1tG gUe/trrEIMBo6iyE//N5 =i7jE -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible index issue on 9.5 slave
On 19/06/14 12:30, Peter Geoghegan wrote: On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick i...@2ndquadrant.com wrote: Interesting, I'll take a look later. I'm pretty suspicious of incompatibilities that may exist between the two sets of OS collations involved here. We aren't very clear on the extent to which what you're doing is supported, but it's certainly the case that bttextcmp()/varstr_cmp()/strcoll() return values must be immutable between the two systems. Still, it should be possible to determine if that's the problem using btreecheck. Do you get perfectly consistent answers between the two when you ORDER BY login? Hmm, nope, different sort order. Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible index issue on 9.5 slave
Peter Geoghegan p...@heroku.com writes: On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick i...@2ndquadrant.com wrote: Interesting, I'll take a look later. I'm pretty suspicious of incompatibilities that may exist between the two sets of OS collations involved here. We aren't very clear on the extent to which what you're doing is supported, but it's certainly the case that bttextcmp()/varstr_cmp()/strcoll() return values must be immutable between the two systems. Oooh, I'll bet that's exactly it. Is the database using UTF8 encoding and a non-C locale? It's well known that OS X's UTF8 locales sort nothing at all like the supposedly equivalent locales on other systems. Still, it should be possible to determine if that's the problem using btreecheck. Does btreecheck attempt to verify that the sort ordering of the index matches the comparison behavior of the datatype? That would (in general) require calling user-defined code, which seems like probably a pretty bad idea for the purposes btreecheck is being advertised for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
Joe Conway m...@joeconway.com writes: On 06/18/2014 08:19 PM, Tom Lane wrote: Actually, I was wondering whether we couldn't remove that CreateTupleDescCopy call entirely. Apparently not, at least without some additional surgery. ExecMakeTableFunctionResult() tries to free the tupledesc and segfaults. Hmm ... oh, I missed this bit: /* We must get the tupledesc from call context */ if (rsinfo IsA(rsinfo, ReturnSetInfo) rsinfo-expectedDesc != NULL) { result = TYPEFUNC_COMPOSITE; if (resultTupleDesc) *resultTupleDesc = rsinfo-expectedDesc; /* Assume no polymorphic columns here, either */ } So it's passing back the same tupdesc passed in by the caller of ExecMakeTableFunctionResult. We can free that all right, but the caller won't be happy :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible index issue on 9.5 slave
On Wed, Jun 18, 2014 at 8:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Still, it should be possible to determine if that's the problem using btreecheck. Does btreecheck attempt to verify that the sort ordering of the index matches the comparison behavior of the datatype? That would (in general) require calling user-defined code, which seems like probably a pretty bad idea for the purposes btreecheck is being advertised for. Yes, it does, but I see no alternative for a general-purpose tool, and the fact that it is general purpose is of considerable value. I have more or less invented my own weird index scans. I assume you're referring to the field-verification of indexes use case, which is not an immediate goal of btreecheck, even though it's an important goal that there has already been some discussion of. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
Joe Conway m...@joeconway.com writes: On 06/18/2014 07:29 PM, Tom Lane wrote: With the attached patch on top of yours, I see no leak anymore. I can confirm that -- rock solid with 1 million iterations. I assume that should not be back-patched though? Well, we usually think memory leaks are back-patchable bugs. I'm a bit worried about the potential performance impact of an extra memory context creation/deletion though. It's probably not noticeable in this test case, but that's just because dblink() is such a spectacularly expensive function. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/18/2014 08:45 PM, Tom Lane wrote: Joe Conway m...@joeconway.com writes: On 06/18/2014 07:29 PM, Tom Lane wrote: With the attached patch on top of yours, I see no leak anymore. I can confirm that -- rock solid with 1 million iterations. I assume that should not be back-patched though? Well, we usually think memory leaks are back-patchable bugs. I'm a bit worried about the potential performance impact of an extra memory context creation/deletion though. It's probably not noticeable in this test case, but that's just because dblink() is such a spectacularly expensive function. Probably so. I'll try to scrounge up some time to test the performance impact of your patch. Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTol38AAoJEDfy90M199hleycP/2kOi2Pa6vcVXKxhNQo3mSdO A84Ae/4LTfUbeVzUTf+uBRcz6LOtOlrHATZOcftJMlyTNmM++JJvF3YYMpGgmxO/ UfiykDs2bqDgPrIxbPxAEpgdeXWcsdJZzzOV1YWurU/qnTdoKD2ArPQhakWLGZH0 CRc46Cn2Qb3NCvnuO5R+ZhGPXS0t6EqTiGlmWtk9ZaI8MHmv1qVKMOKBor3v+2lk /wdlc5lypPnZ07NKIjCVN0gzEJ+RV9nxQk1M3QkNYNsHOBiexEmaucXo6ab4derO nXoOGo/0XwMhlhA6vrKlAKhxjkTNnJulVHQOWVLMCiNvcfX0KISJZVIoT/NraR94 Hc5ZZMjmhosbU8sgQiKjGoFSJq2Wld6SADuLt6xvsY9k5AiuEcPFbfVjAWlCIaEm lOQ2cOrk+4nhEA1ygIsRw96GMT2WaEtOek4l3hJs6yd3zuzXoouO9i02QaXBqgR8 QmIJ+tOjwKnOPFThbJMjxlsrQMwJ6mPywhwt6E74YsKV6ndGFigBOfzjZlOn3OKX DM60oWFhuCfHQdOlid1d6Uyuc4yeFb4g4XSS4sXW9wLPpvve63NxxBQ8ez0r3Up8 nLwcCqxFZRFwKX2Wp6fgtpmhgxolLF29XvpfTUMR6pPai+/Ei59vU4JXqqz0haqa 3UHpQ3AznN5vm+UvZJYe =pvQS -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible index issue on 9.5 slave
On 19/06/14 12:35, Tom Lane wrote: Peter Geoghegan p...@heroku.com writes: On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick i...@2ndquadrant.com wrote: Interesting, I'll take a look later. I'm pretty suspicious of incompatibilities that may exist between the two sets of OS collations involved here. We aren't very clear on the extent to which what you're doing is supported, but it's certainly the case that bttextcmp()/varstr_cmp()/strcoll() return values must be immutable between the two systems. Oooh, I'll bet that's exactly it. Is the database using UTF8 encoding and a non-C locale? Yup, that is indeed the case. It's well known that OS X's UTF8 locales sort nothing at all like the supposedly equivalent locales on other systems. True, that. A different sort order wouldn't have surprised me, but the failure to return an extant row had me thinking there was something awry with the laptop causing file corruption (it's getting on in years and has been bashed about a bit). Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
David Fetter da...@fetter.org wrote: On Wed, Jun 18, 2014 at 03:30:34PM -0700, Kevin Grittner wrote: the more I think about it (and discuss it) the more I'm inclined to suffer the additional complexity of the standard syntax for specifying transition relations in order to avoid unnecessary overhead creating them when not needed. I'm also leaning toward just storing TIDs in the tuplestores, even though it requires mixed snapshots in executing queries in the triggers. So in this case one tuplestore with two TIDs, either of which might be NULL? No, one or two tuplestores, depending on need, each with TIDs of either the before set or the after set of all tuples affected by the DML statement, however many that may be. More or less like this first draft patch, except with TIDs instead of copies of the rows, and with better selectivity about when the tuplestores are generated. Having the before tuplestore available to a BEFORE STATEMENT trigger would make it possible to do things with the before transition table that are fragile and hacky now. How do you propose to have an accurate before tuplestore of affected rows before the scan runs and before the BEFORE ... FOR EACH ROW triggers fire? That would be particularly interesting to try to generate if the scan involves evaluating any VOLATILE functions. Again, I see the tuplestores as infrastructure both deltas and many other things, so long as they're attached to the right objects. In my opinion, the right objects would include materialized views, triggers, and certain very specific kinds of DML of which all the RETURNING ones are one example. They would not include the underlying tables. Right now I've presented something for capturing the information and allowing it to be accessed from triggers. I don't think the means of capturing it precludes passing it along to other consumers. I would like to get this part working before trying to wire it up to anything other than triggers. The best way to kill an effort like this is to allow scope creep. Now, if you think that something fundamentally belongs at another level, that's something to address -- but the point where we capture data to pass to triggers seems like a natural and sound place to capture it for other purposes. And since almost all the code for this patch is in trigger.c, this seems like I'm in the right place for a trigger feature. standard syntax, the first thing would be for the statement to somehow communicate to the trigger layer the need to capture a tuplestore it might otherwise not generate, and there would need to be a way for the statement to access the needed tuplestore(s). Right. Hence my proposal to make the existence of the tuplestores part of Query, writeable by the types of triggers which specify that they'll be needed. I just don't think that Rel is the place to connect them. I don't know what you mean by that. I've already said that I now think we should use the standard CREATE TRIGGER syntax to specify the transition tables, and that if we do that we don't need the reloption that's in the patch. If you ignore the 19 lines of new code to add that reloption, absolutely 100% of the code changes in the patch so far are in trigger.c and trigger.h. That reloption was never anything I would consider as *connecting* the tuplestores to the Rel anyway -- it was simply an attempt to minimize unnecessary work. No matter how I try, I'm not seeing what you mean by references to connecting to Rel. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/18/2014 12:14 PM, Joe Conway wrote: On 06/18/2014 12:09 PM, Tom Lane wrote: Joe Conway m...@joeconway.com writes: I think the context deletion was missed in the first place because storeRow() is the wrong place to create the context. Rather than creating the context in storeRow() and deleting it two levels up in materializeQueryResult(), I think it should be created and deleted in the interim layer, storeQueryResult(). Patch along those lines attached. Since the storeInfo struct is longer-lived than storeQueryResult(), it'd probably be better if the cleanup looked like +if (sinfo-tmpcontext != NULL) + MemoryContextDelete(sinfo-tmpcontext); +sinfo-tmpcontext = NULL; good point If there is no further discussion, I will commit this version of the patch back to 9.2 where this leak was introduced. Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTompDAAoJEDfy90M199hlZoMP/333eXbmFvh1WCowMehKPEy1 WV3VgyZx7hBvSr/cP/HUMjXTb34hRgi5uGa79ZMyAW+U+jDxrIN4ozxp54LlgU5x pTzD8J8VviunvWzf6stgHTe5bfcBJ9kA9oZlgHApH+JRGeySsYALI4ZBluJa1tRa gf6ePd8Kwl4AUucbM3v7kJGxtVS5XRGKcffJnATg+OLiBUReVv9ZCjxeYasyOaRt K2Q8ijW878UgV9HViTCsl8THelnlh7q0BpbKaSZBJG847CgmrVxfRt/l8Od8a4G/ ODoQ/fV0ZOI3h5j9oirL4/RC9HWOqchJBvBd3CK7caWLwNKVwqqEWGV3uqEO2TnA NH3QgHb4u8WGNhoWbwL6dGWa27s2EntlWLpJRuavKxCIN2owvVBKSZ6H8d3dSlI5 AqaGnxOPvxWEB5K2w0wBynkZ9nrk4PYuIVKADu8fqyYyDsG3wGsZmI1trLNXj5sm uE/FTbdvUjcU2uIVMMJSbPxa5JvvfR/+9rM/AF8VP5PMnSs1CyeLQXkW7nazRZOP 7zHUY6N4vwem8tVQuuXPouLu2B/eTJoXaUTm7iQvXztJDmwKxKgYCzLW/LxfvI4Q mDIY/Arh/4RdC7kVXu6m5BEisNIbBuKtcA6f0DM+4G9i0Wtft450fgajYV4xcic9 DMPyBMwD7csz3ssF04Ox =PJyj -END PGP SIGNATURE- diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index a81853f..2e43c8f 100644 *** a/contrib/dblink/dblink.c --- b/contrib/dblink/dblink.c *** storeQueryResult(storeInfo *sinfo, PGcon *** 1076,1081 --- 1076,1089 if (!PQsetSingleRowMode(conn)) /* shouldn't fail */ elog(ERROR, failed to set single-row mode for dblink query); + /* Create short-lived memory context for data conversions */ + if (!sinfo-tmpcontext) + sinfo-tmpcontext = AllocSetContextCreate(CurrentMemoryContext, + dblink temporary context, + ALLOCSET_DEFAULT_MINSIZE, + ALLOCSET_DEFAULT_INITSIZE, + ALLOCSET_DEFAULT_MAXSIZE); + for (;;) { CHECK_FOR_INTERRUPTS(); *** storeQueryResult(storeInfo *sinfo, PGcon *** 1119,1124 --- 1127,1137 /* clean up GUC settings, if we changed any */ restoreLocalGucs(nestlevel); + /* clean up data conversion short-lived memory context */ + if (sinfo-tmpcontext != NULL) + MemoryContextDelete(sinfo-tmpcontext); + sinfo-tmpcontext = NULL; + /* return last_res */ res = sinfo-last_res; sinfo-last_res = NULL; *** storeRow(storeInfo *sinfo, PGresult *res *** 1204,1218 if (sinfo-cstrs) pfree(sinfo-cstrs); sinfo-cstrs = (char **) palloc(nfields * sizeof(char *)); - - /* Create short-lived memory context for data conversions */ - if (!sinfo-tmpcontext) - sinfo-tmpcontext = - AllocSetContextCreate(CurrentMemoryContext, - dblink temporary context, - ALLOCSET_DEFAULT_MINSIZE, - ALLOCSET_DEFAULT_INITSIZE, - ALLOCSET_DEFAULT_MAXSIZE); } /* Should have a single-row result if we get here */ --- 1217,1222 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GIN pending list pages not recycled promptly (was Re: [HACKERS] GIN improvements part 1: additional information)
On Wed, Jan 22, 2014 at 9:12 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/22/2014 03:39 AM, Tomas Vondra wrote: What annoys me a bit is the huge size difference between the index updated incrementally (by a sequence of INSERT commands), and the index rebuilt from scratch using VACUUM FULL. It's a bit better with the patch (2288 vs. 2035 MB), but is there a chance to improve this? Hmm. What seems to be happening is that pending item list pages that the fast update mechanism uses are not getting recycled. When enough list pages are filled up, they are flushed into the main index and the list pages are marked as deleted. But they are not recorded in the FSM, so they won't be recycled until the index is vacuumed. Almost all of the difference can be attributed to deleted pages left behind like that. So this isn't actually related to the packed postinglists patch at all. It just makes the bloat more obvious, because it makes the actual size of the index size, excluding deleted pages, smaller. But it can be observed on git master as well: I created a simple test table and index like this: create table foo (intarr int[]); create index i_foo on foo using gin(intarr) with (fastupdate=on); I filled the table like this: insert into foo select array[-1] from generate_series(1, 1000) g; postgres=# \d+i List of relations Schema | Name | Type | Owner | Size | Description +--+---+++- public | foo | table | heikki | 575 MB | (1 row) postgres=# \di+ List of relations Schema | Name | Type | Owner | Table | Size | Description +---+---++---++- public | i_foo | index | heikki | foo | 251 MB | (1 row) I wrote a little utility that scans all pages in a gin index, and prints out the flags indicating what kind of a page it is. The distribution looks like this: 19 DATA 7420 DATA LEAF 24701 DELETED 1 LEAF 1 META I think we need to add the deleted pages to the FSM more aggressively. I tried simply adding calls to RecordFreeIndexPage, after the list pages have been marked as deleted, but unfortunately that didn't help. The problem is that the FSM is organized into a three-level tree, and RecordFreeIndexPage only updates the bottom level. The upper levels are not updated until the FSM is vacuumed, so the pages are still not visible to GetFreeIndexPage calls until next vacuum. The simplest fix would be to add a call to IndexFreeSpaceMapVacuum after flushing the pending list, per attached patch. I'm slightly worried about the performance impact of the IndexFreeSpaceMapVacuum() call. It scans the whole FSM of the index, which isn't exactly free. So perhaps we should teach RecordFreeIndexPage to update the upper levels of the FSM in a retail-fashion instead. I wonder if you pursued this further? You recently added a number of TODO items related to GIN index; is it worth adding this to the list? -- Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers