Re: [HACKERS] Performance problem in PLPgSQL
On Wednesday, July 24, 2013 4:23 AM Tom Lane wrote: > I wrote: > > Marc Cousin writes: > >> The example below is of course extremely simplified, and obviously > not > >> what we are really doing in the database, but it exhibits the > slowdown > >> between 9.1.9 and 9.2.4. > > > Hm. Some experimentation shows that the plan cache is failing to > switch > > to a generic plan, but I'm not sure why the cast would have anything > to > > do with that ... > > Hah, I see why: > > (gdb) s > 1009if (plansource->generic_cost < avg_custom_cost * 1.1) > (gdb) p plansource->generic_cost > $18 = 0.012501 > (gdb) p avg_custom_cost > $19 = 0.01 > (gdb) p avg_custom_cost * 1.1 > $20 = 0.011001 > > That is, the estimated cost of the custom plan is just the evaluation > time for a simple constant, while the estimated cost of the generic > plan > includes a charge for evaluation of int4_numeric(). That makes the > latter more than ten percent more expensive, and since this logic isn't > considering anything else at all (particularly not the cost of > planning), it thinks that makes the custom plan worth picking. > > We've been around on this before, but not yet thought of a reasonable > way to estimate planning cost, let alone compare it to estimated > execution costs. Up to now I hadn't thought that was a particularly > urgent issue, but this example shows it's worth worrying about. > > One thing that was suggested in the previous discussion is to base the > planning cost estimate on the length of the rangetable. We could > do something trivial like add "10 * (length(plan->rangetable) + 1)" > in this comparison. > > Another thing that would fix this particular issue, though perhaps not > related ones, is to start charging something nonzero for ModifyTable > nodes, say along the lines of one seq_page_cost per inserted/modified > row. That would knock the total estimated cost for an INSERT up enough > that the ten percent threshold wouldn't be exceeded. Shouldn't it consider new value of boundparam to decide whether a new custom plan is needed, as that can be one of the main reason why it would need different plan. Current behavior is either it will choose generic plan or build a new custom plan with new parameters based on Choose_custom_plan(). Shouldn't the behavior of this be as below: a. choose generic plan b. choose one of existing custom plan c. create new custom plan The choice can be made based on the new value of bound parameter. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance_work_mem and CREATE INDEX time
On Wed, Jul 24, 2013 at 3:20 AM, Jeff Janes wrote: > On Mon, Jul 22, 2013 at 9:11 PM, Amit Langote wrote: >> Hello, >> >> While understanding the effect of maintenance_work_mem on time taken >> by CREATE INDEX, I observed that for the values of >> maintenance_work_mem less than the value for which an internal sort is >> performed, the time taken by CREATE INDEX increases as >> maintenance_work_increases. >> >> My guess is that for all those values an external sort is chosen at >> some point and larger the value of maintenance_work_mem, later the >> switch to external sort would be made causing CREATE INDEX to take >> longer. That is a smaller value of maintenance_work_mem would be >> preferred for when external sort is performed anyway. Does that make >> sense? > > The heap structure used in external sorts is cache-unfriendly. The > bigger the heap used, the more this unfriendliness becomes apparent. > And the bigger maintenance_work_mem, the bigger the heap used. > > The bigger heap also means you have fewer "runs" to merge in the > external sort. However, as long as the number of runs still fits in > the same number of merge passes, this is generally not a meaningful > difference. Does fewer runs mean more time (in whichever phase of external sort)? -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
On Tuesday, July 23, 2013 5:06 AM Josh Berkus wrote: > All, > > Christophe just discovered something with include files which is going > to cause issues with ALTER SYSTEM SET. > > So, take as a hypothetical that you use the default postgresql.conf > file, which sets shared_buffers = 32MB. > > Instead of editing this file, you do ALTER SYSTEM SET shared_buffers = > '1GB', which updates config/postgresql.auto.conf. > > Then you restart PostgreSQL. > > Everything is hunky-dory, until a later occasion where you *reload* > postgresql. Then postgres startup hits the first "shared_buffers=32MB" > (in postgresql.conf), says "I can't change shared buffers on a reload", > and aborts before ever getting to postgresql.auto.conf. It doesn't abort after showing initial message "parameter %s cannot be changed without restart", rather it processes all remaining parameters. We can even test it by setting 1 postmaster and 1 sighup variable, after reload, even though it log message for postmaster variable, but it will set the sighup variable. So in this the real problem is that there is some message in log which can create confusion, otherwise the behavior will be exactly what user wants as per Postgres specs. To avoid the message, following can be done: 1. We need to make sure that in function ProcessConfigFile before calling set_config_option(), list of parameters is unique. a. it can be done while storing the elements in list as suggested by me in my yesterday's mail. http://www.postgresql.org/message-id/004c01ce8761$b8a4ab20$29ee0160$@kap...@huawei.com b. after parsing all the config files, make sure the list contain one entry for any variable (giving priority to elements that come later) 2. As this is just a confusing message issue, we can even update the docs to explain this behavior. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
Noah Misch said: > Other aggregates based on this syntax might not desire such type unification. Then there would have to be some way to distinguish that. Maybe those could have -1 and the standard hypothetical set functions -2, with some flag in CREATE AGGREGATE to sort it out. > Having parse analysis do that distorts the character of an "any" argument. I > think the proper place for such processing is the first call to a transition > function. Except there isn't one. > But let's not make the > parser presume that an aggordnargs=-1 aggregate always wants its "any" > arguments handled in the manner of the standard hypothetical set functions. This has to happen in the parser because these are errors that should be caught before execution: rank(foo) within group (order by bar,baz) rank(integercol) within group (order by textcol) And collations have to be resolved (pairwise) before sorting can happen: rank(textval COLLATE "C") within group (order by foo) -- sorts in "C" rank(textval COLLATE "C") within group (order by bar COLLATE "en_US") -- error (basically, in rank(x) within group (order by y) where x and y are collatable, the collation rules apply exactly as though you were doing (x < y), with all the implicit vs. explicit stuff included) And this: rank(1.1) within group (order by intcol) should become rank(1.1) within group (order by intcol::numeric) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shorter iterations of join_info_list
[ sorry for slow response, this month has been mostly crazy ] Antonin Houska writes: > As far as I understand, deconstruct_recurse() ensures that > SpecialJoinInfo of a new join always gets added to higher position in > join_info_list than SJ infos of all joins located below the new join in > the tree. I wonder if we can rely on that fact sometimes. FWIW, I think of most of those planner lists as being unordered sets. Depending on a particular ordering definitely adds fragility; so I'd not want to introduce such a dependency without solid evidence of a substantial speed improvement. The cases you mention don't seem very likely to offer any noticeable gain at all --- at least, I don't recall seeing any of those functions show up high in profiles. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Suggestion for concurrent index creation using a single full scan operation
On Tue, Jul 23, 2013 at 01:06:26PM +0100, Tim Kane wrote: > I haven't given this a lot of thought, but it struck me that when > rebuilding tables (be it for a restore process, or some other operational > activity) - there is more often than not a need to build an index or two, > sometimes many indexes, against the same relation. > > It strikes me that in order to build just one index, we probably need to > perform a full table scan (in a lot of cases). If we are building > multiple indexes sequentially against that same table, then we're probably > performing multiple sequential scans in succession, once for each index. Check. > Could we architect a mechanism that allowed multiple index creation > statements to execute concurrently, with all of their inputs fed directly > from a single sequential scan against the full relation? > > From a language construct point of view, this may not be trivial to > implement for raw/interactive SQL - but possibly this is a candidate for > the custom format restore? As Greg Stark mentioned, pg_restore can already issue index build commands in parallel. Where applicable, that's probably superior to having one backend build multiple indexes during a single heap scan. Index builds are CPU-intensive, and the pg_restore approach takes advantage of additional CPU cores in addition to possibly saving I/O. However, the pg_restore method is not applicable if you want CREATE INDEX CONCURRENTLY, and it's not applicable for implicit index building such as happens for ALTER TABLE rewrites and for VACUUM FULL. Backend-managed concurrent index builds could shine there. > I presume this would substantially increase the memory overhead required to > build those indexes, though the performance gains may be advantageous. The multi-index-build should respect maintenance_work_mem overall. Avoiding cases where that makes concurrent builds slower than sequential builds is a key challenge for such a project: - If the index builds each fit in maintenance_work_mem when run sequentially and some spill to disk when run concurrently, expect concurrency to lose. - If the heap is small enough to stay in cache from one index build to the next, performing the builds concurrently is probably a wash or a loss. - Concurrency should help when a wide-row table large enough to exhaust OS cache has narrow indexes that all fit in maintenance_work_mem. I don't know whether concurrency would help for a huge-table scenario where the indexes do overspill maintenance_work_mem. You would have N indexes worth of external merge files competing for disk bandwidth; that could cancel out heap I/O savings. Overall, it's easy to end up with a loss. We could punt by having an index_build_concurrency GUC, much like pg_restore relies on the user to discover a good "-j" value. But if finding cases where concurrency helps is too hard, leaving the GUC at one would become the standard advice. > Apologies in advance if this is not the correct forum for suggestions.. It's the right forum. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance_work_mem and CREATE INDEX time
On Wed, Jul 24, 2013 at 11:30 AM, Amit Langote wrote: > On Wed, Jul 24, 2013 at 6:02 AM, Jeff Janes wrote: >> If you are using trace_sort to report that, it reports the switch as >> happening as soon as it runs out of memory. >> >> At point, all we have been doing is reading tuples into memory. The >> time it takes to do that will depend on maintenance_work_mem, because >> that affects how many tuples fit in memory. But all the rest of the >> tuples need to be read sooner or later anyway, so pushing more of them >> to later doesn't improve things overall it just shifts timing around. >> >> After it reports the switch, it still needs to heapify the existing >> in-memory tuples before the tapesort proper can begin. This is where >> the true lost opportunities start to arise, as the large heap starts >> driving cache misses which would not happen at all under different >> settings. >> >> Once the existing tuples are heapified, it then continues to use the >> heap to pop tuples from it to write out to "tape", and to push newly >> read tuples onto it. This also suffers lost opportunities. >> >> Once all the tuples are written out and it starts merging, then the >> large maintenance_work_mem is no longer a penalty as the new heap is >> limited by the number of tapes, which is almost always much smaller. >> In fact this stage will actually be faster, but not by enough to make >> up for the earlier slow down. >> >> So it is not surprising that the time before the switch is reported is >> a small part of the overall time difference. >> > > So, is it the actual sorting (before merging) that suffers with larger > maintenance_work_mem? I am sorry but I can not grasp the complexity of > external sort code at this point, so all I can say is that during an > external sort a smaller value of maintenance_work_mem is beneficial > (based on my observations in tests). But how that follows from what is > going on in the implementation of external sort is still something I > am working on understanding. > Or does the increased create index time follow from something else altogether (not any part of external sort) may be still another question. Since we have to relate that to maintenance_work_mem, the first thing I could think of was to look at sorting part of it. -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance_work_mem and CREATE INDEX time
On Wed, Jul 24, 2013 at 6:02 AM, Jeff Janes wrote: > On Tue, Jul 23, 2013 at 1:23 AM, Amit Langote wrote: >> On Tue, Jul 23, 2013 at 1:11 PM, Amit Langote >> wrote: >>> Hello, >>> >>> While understanding the effect of maintenance_work_mem on time taken >>> by CREATE INDEX, I observed that for the values of >>> maintenance_work_mem less than the value for which an internal sort is >>> performed, the time taken by CREATE INDEX increases as >>> maintenance_work_increases. >>> >>> My guess is that for all those values an external sort is chosen at >>> some point and larger the value of maintenance_work_mem, later the >>> switch to external sort would be made causing CREATE INDEX to take >>> longer. That is a smaller value of maintenance_work_mem would be >>> preferred for when external sort is performed anyway. Does that make >>> sense? >>> >> >> Upon further investigation, it is found that the delay to switch to >> external sort caused by a larger value of maintenance_work_mem is >> small compared to the total time of CREATE INDEX. > > If you are using trace_sort to report that, it reports the switch as > happening as soon as it runs out of memory. > > At point, all we have been doing is reading tuples into memory. The > time it takes to do that will depend on maintenance_work_mem, because > that affects how many tuples fit in memory. But all the rest of the > tuples need to be read sooner or later anyway, so pushing more of them > to later doesn't improve things overall it just shifts timing around. > > After it reports the switch, it still needs to heapify the existing > in-memory tuples before the tapesort proper can begin. This is where > the true lost opportunities start to arise, as the large heap starts > driving cache misses which would not happen at all under different > settings. > > Once the existing tuples are heapified, it then continues to use the > heap to pop tuples from it to write out to "tape", and to push newly > read tuples onto it. This also suffers lost opportunities. > > Once all the tuples are written out and it starts merging, then the > large maintenance_work_mem is no longer a penalty as the new heap is > limited by the number of tapes, which is almost always much smaller. > In fact this stage will actually be faster, but not by enough to make > up for the earlier slow down. > > So it is not surprising that the time before the switch is reported is > a small part of the overall time difference. > So, is it the actual sorting (before merging) that suffers with larger maintenance_work_mem? I am sorry but I can not grasp the complexity of external sort code at this point, so all I can say is that during an external sort a smaller value of maintenance_work_mem is beneficial (based on my observations in tests). But how that follows from what is going on in the implementation of external sort is still something I am working on understanding. -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
On Tue, Jul 23, 2013 at 01:21:52AM +, Andrew Gierth wrote: > For hypothetical set functions we add a special case, aggordnargs=-1, > for which both the aggregate and the finalfn must be defined as > (variadic "any") and parse analysis detects this case and unifies the > types of the normal args with those of the ORDER BY args. Other aggregates based on this syntax might not desire such type unification. Having parse analysis do that distorts the character of an "any" argument. I think the proper place for such processing is the first call to a transition function. The transition functions could certainly call a new API exposed under src/backend/parser to do the heavy lifting. But let's not make the parser presume that an aggordnargs=-1 aggregate always wants its "any" arguments handled in the manner of the standard hypothetical set functions. The rest of the plan looks good so far. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Preventing tuple-table leakage in plpgsql
On Mon, Jul 22, 2013 at 10:02:30PM -0400, Tom Lane wrote: > Noah Misch writes: > > On Sun, Jul 21, 2013 at 12:40:38PM -0400, Tom Lane wrote: > >> Hmm ... good point. The other plan I'd been considering was to add > >> explicit tracking inside spi.c of all tuple tables created within the > >> current procedure, and then have AtEOSubXact_SPI flush any that were > >> created inside a failed subxact. > > > Is there reason to believe we wouldn't eventually find a half dozen other > > allocations calling for similar bespoke treatment? Does something make > > tuple > > tables special among memory allocations, or are they just the garden-variety > > allocation that happens to bother the test case at hand? > > It's hard to speculate about the memory management habits of third-party > SPI-using code. But in plpgsql, the convention is that random bits of > memory should be allocated in a short-term context separate from the SPI > procCxt --- typically, the estate->eval_econtext expression context, > which exec_stmt_block already takes care to clean up when catching an > exception. So the problem is that that doesn't work for tuple tables, > which have procCxt lifespan. The fact that they tend to be big (at > least 8K apiece) compounds the issue. Reasonable to treat them specially, per your plan, then. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
Stephen Frost writes: > * Andrew Gierth (and...@tao11.riddles.org.uk) wrote: >> If you have legitimate technical concerns then let's hear them, now. > Fine- I'd have been as happy leaving this be and letting Greg commit it, > but if you'd really like to hear my concerns, I'd start with pointing > out that it's pretty horrid to have to copy every record around like > this and that the hack of CreateTupleDescCopyExtend is pretty terrible > and likely to catch people by surprise. Having FunctionNext() operate > very differently depending on WITH ORDINALITY is ugly and the cause of > the bug that was found. All-in-all, I'm not convinced that this is > really a good approach and feel it'd be better off implemented at a > different level, eg a node type instead of a hack on top of the existing > SRF code. I took the time to read through this patch, finally. i think the $64 question is pretty much what Stephen says above: do we like tying this behavior to FunctionScan, as opposed to having it be some kind of expression node? You could certainly imagine a WithOrdinality expression node that takes in values of a set-returning expression, and returns them with an extra column tacked on. This would resolve the problem that was mentioned awhile back that the current approach can't support SRFs in targetlists. If it weren't that we've been speculating for years about deprecating SRFs-in-tlists once we had LATERAL, I would personally consider this patch DOA in this form. If we do think we'll probably deprecate that feature, then not extending WITH ORDINALITY to such cases is at least defensible. On the other hand, considering that we've yet to ship a release supporting LATERAL, it's probably premature to commit to such deprecation --- we don't really know whether people will find LATERAL to be a convenient and performant substitute. As far as performance goes, despite Stephen's gripe above, I think this way is likely better than any alternative. The reason is that once we've disassembled the function result tuple and tacked on the counter, we have a reasonable shot at things staying like that (in the form of a virtual TupleTableSlot). The next higher level of evaluation can probably use the column Datums as-is. A WithOrdinality expression node would have to disassemble the input tuple and then make a new tuple, which the next higher expression level would likely pull apart again :-(. Also, any other approach would lead to needing to store the ordinality values inside the FunctionScan's tuplestore, bloating that storage with rather-low-value data. The other big technical issue I see is representation of the rowtype of the result. If we did it with a WithOrdinality expression node, the result would always be of type RECORD, and we'd have to use blessed tuple descriptors to keep track of exactly which record type a particular instance emits. This is certainly do-able (see RowExpr for precedent). Attaching the functionality to FunctionScan reduces the need for that because the system mostly avoids trying to associate any type OID with the rowtype of a FROM item. Instead though, we've got a lot of ad-hoc code that deals with RangeTblEntry type information. A big part of the patch is dealing with extending that code, and frankly I've got about zero confidence that the patch has found everything that needs to be found in that line. A patch using an expression node would probably need to touch only a much more localized set of places to handle the type description issue. Anyway, on balance I'm satisfied with this overall approach, though it's not without room for debate. I am fairly dissatisfied with the patch at a stylistic level, though. It seems way too short on comments. People griped about the code in nodeFunctionscan in particular, but I think all the changes in ad-hoc type-management code elsewhere are even more deserving of comments, and they mostly didn't get that. Likewise, there's no documentation anywhere that I can see of the new interrelationships of struct fields, such as that if a RangeTblEntry has funcordinality set, then its various column-related fields such as eref->colnames include a trailing INT8 column for the ordinality. Also, maybe I'm misreading the patch (have I mentioned lately that large patches in -u format are practically illegible?), but it sure looks like it flat out removed several existing regression-test cases and a few existing comments as well. How is that considered acceptable? FWIW, I concur with the gripe I remember seeing upthread that the default name of the added column ought not be "?column?". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improve Chinese locale performance
On 07/23/2013 09:42 PM, Craig Ringer wrote: (Replying on phone, please forgive bad quoting) Isn't this pretty much what adopting ICU is supposed to give us? OS-independent collations? Yes, we need OS-independent collations. I'd be interested in seeing the rest data for this performance report, partly as I'd like to see how ICU collations would compare when ICU is crudely hacked into place for testing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
Stephen Frost said: > [stuff about foreign tables] I think the issue with foreign tables is probably moot because if you _did_ want to have some types of foreign tables with a fixed ordinality, you'd probably also want qual pushdown to work for it (i.e. so that WHERE rownum=123 doesn't have to filter all the rows), whereas with SRFs this doesn't really apply. For this to work, foreign tables with a fixed ordering would have to provide that in the FDW - which is in any case the only place that knows whether a fixed order would even make any sense. So I see no overlap here with the SRF ordinality case. As for VALUES, the spec regards those as constructing a table and therefore not having any inherent order - the user can add their own ordinal column if need be. Even if you did want to add WITH ORDINALITY for VALUES, though, it would actually make more sense to do it in the Values Scan node since that maintains its own ordinal position already. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
On Tuesday, July 23, 2013, David Fetter wrote: > > Are you saying that there's stuff that if I don't put it in now will > impede our ability to add this to FTs later? > I'm saying that it'd be a completely different implementation and that this one would get in the way and essentially have to be ripped out. No one is saying that this patch wouldn't work for the specific use-case that it set out to meet, and maybe it's unfair for us to consider possible use-cases beyond the patch's goal and the spec requirement, but that, IMO, is also one of the things that makes PG great. MVCC isn't necessary and isn't required by spec either. Thanks, Stephen
[HACKERS] Failure to use generic plans (was: Re: Performance problem in PLPgSQL)
As failures to use a generic plan goes, that one's fairly tame. I've seen much worse. For example: PREPARE foo(integer[]) AS SELECT * FROM complexview WHERE id = ANY ($1); where the caller typically supplies 1-5 array elements (or any number less than 10, because generic parameter arrays are assumed to have 10 elements). This one can be a massive performance regression between 9.1 and 9.2; the first guy who mentioned this on IRC was getting a 40x slowdown (~20ms planning time vs. 0.5ms execution time). -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance problem in PLPgSQL
I wrote: > Marc Cousin writes: >> The example below is of course extremely simplified, and obviously not >> what we are really doing in the database, but it exhibits the slowdown >> between 9.1.9 and 9.2.4. > Hm. Some experimentation shows that the plan cache is failing to switch > to a generic plan, but I'm not sure why the cast would have anything to > do with that ... Hah, I see why: (gdb) s 1009if (plansource->generic_cost < avg_custom_cost * 1.1) (gdb) p plansource->generic_cost $18 = 0.012501 (gdb) p avg_custom_cost $19 = 0.01 (gdb) p avg_custom_cost * 1.1 $20 = 0.011001 That is, the estimated cost of the custom plan is just the evaluation time for a simple constant, while the estimated cost of the generic plan includes a charge for evaluation of int4_numeric(). That makes the latter more than ten percent more expensive, and since this logic isn't considering anything else at all (particularly not the cost of planning), it thinks that makes the custom plan worth picking. We've been around on this before, but not yet thought of a reasonable way to estimate planning cost, let alone compare it to estimated execution costs. Up to now I hadn't thought that was a particularly urgent issue, but this example shows it's worth worrying about. One thing that was suggested in the previous discussion is to base the planning cost estimate on the length of the rangetable. We could do something trivial like add "10 * (length(plan->rangetable) + 1)" in this comparison. Another thing that would fix this particular issue, though perhaps not related ones, is to start charging something nonzero for ModifyTable nodes, say along the lines of one seq_page_cost per inserted/modified row. That would knock the total estimated cost for an INSERT up enough that the ten percent threshold wouldn't be exceeded. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On 07/23/2013 03:34 PM, Greg Smith wrote: > I happen to think the review structure is one of the most important > components to PostgreSQL release quality. It used to be a single level > review with just the committers, now it's a two level structure. The > reason the Postgres code is so good isn't that the submitted development > is inherently any better than other projects. There's plenty of bogus > material submitted here. It's the high standards for review and commit > that are the key filter. The importance of the process to the result > isn't weighed as heavily as I think it should be. I think we're in violent agreement here. Now, we just need to convince everyone else on this list. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On 7/23/13 2:30 PM, Josh Berkus wrote: You know as well as me that, as consultants, we can get clients to pay for 10% extra time for review in the course of developing a feature Before this number gets quoted anywhere, I think it's on the low side. I've spent a good bit of time measuring how much time it takes to do a fair offsetting review--one where you put as much time in as it takes to review your submission--and I keep getting numbers more in the 20 to 25% range. The work involved to do a high quality review takes a while. I happen to think the review structure is one of the most important components to PostgreSQL release quality. It used to be a single level review with just the committers, now it's a two level structure. The reason the Postgres code is so good isn't that the submitted development is inherently any better than other projects. There's plenty of bogus material submitted here. It's the high standards for review and commit that are the key filter. The importance of the process to the result isn't weighed as heavily as I think it should be. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
On Tue, Jul 23, 2013 at 06:09:20PM -0400, Stephen Frost wrote: > David > > On Tuesday, July 23, 2013, David Fetter wrote: > > > > There are a lot of ways foreign tables don't yet act like local > > ones. Much as I'm a booster for fixing that problem, I'm thinking > > improvements in this direction are for a separate patch. > > > > This isn't about making FDWs more like local tables- indeed, quite > the opposite. The question is if we should declare that WITH > ORDINALITY will only ever be for SRFs or if we should consider that > it might be useful with FDWs specifically because they are not > unordered sets as tables are. Claiming that question is unrelated > to the implementation of WITH ORDINALITY is rather... Bizarre. Are you saying that there's stuff that if I don't put it in now will impede our ability to add this to FTs later? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
David On Tuesday, July 23, 2013, David Fetter wrote: > > There are a lot of ways foreign tables don't yet act like local ones. > Much as I'm a booster for fixing that problem, I'm thinking > improvements in this direction are for a separate patch. > This isn't about making FDWs more like local tables- indeed, quite the opposite. The question is if we should declare that WITH ORDINALITY will only ever be for SRFs or if we should consider that it might be useful with FDWs specifically because they are not unordered sets as tables are. Claiming that question is unrelated to the implementation of WITH ORDINALITY is rather... Bizarre. Thanks, Stephen
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
On Tue, Jul 23, 2013 at 05:23:17PM -0400, Stephen Frost wrote: > * Greg Stark (st...@mit.edu) wrote: > > So given that WITH ORDINALITY is really only useful for UNNEST and we > > can generalize it to all SRFs on the basis that Postgres SRFs do > > produce ordered sets not unordered relations it isn't crazy for the > > work to be in the Function node. > > I agree, it isn't *crazy*. :) > > > Now that I've written that though it occurs to me to wonder whether > > FDW tables might be usefully said to be ordered too though? > > My thought around this was a VALUES() construct, but FDWs are an > interesting case to consider also; with FDWs it's possible that > something is said in SQL/MED regarding this question- perhaps it would > make sense to look there? There are a lot of ways foreign tables don't yet act like local ones. Much as I'm a booster for fixing that problem, I'm thinking improvements in this direction are for a separate patch. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
> Everything was already *not* hunky-dory as soon as you did that, since > a SIGHUP would have had the same problem. > > I'd be inclined to think that ALTER SYSTEM SET should not be allowed to > modify any PGC_POSTMASTER parameters. That makes alter system set a bunch less useful, but might be insurmountable. Anyway, I also regard this as a problem we need to resolve now that we have the config directory if we expect people to build autotuning tools. Once someone is relying on an autotuning tool which drops a file in config/, it becomes a real problem that there are uncommented PGC_POSTMASTER options in the default postgresql.conf. I'm not sure what the solution to that problem should be, but I do know that we're going to hear it a lot from users as includes and the config directory get more common. Certainly any solution which says "first, manually edit postgresql.conf" makes the config directory into a kind of joke. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
* Greg Stark (st...@mit.edu) wrote: > So given that WITH ORDINALITY is really only useful for UNNEST and we > can generalize it to all SRFs on the basis that Postgres SRFs do > produce ordered sets not unordered relations it isn't crazy for the > work to be in the Function node. I agree, it isn't *crazy*. :) > Now that I've written that though it occurs to me to wonder whether > FDW tables might be usefully said to be ordered too though? My thought around this was a VALUES() construct, but FDWs are an interesting case to consider also; with FDWs it's possible that something is said in SQL/MED regarding this question- perhaps it would make sense to look there? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
On Tue, Jul 23, 2013 at 9:27 PM, Stephen Frost wrote: > > Fine- I'd have been as happy leaving this be and letting Greg commit it, > but if you'd really like to hear my concerns, I'd start with pointing > out that it's pretty horrid to have to copy every record around like > this and that the hack of CreateTupleDescCopyExtend is pretty terrible > and likely to catch people by surprise. Having FunctionNext() operate > very differently depending on WITH ORDINALITY is ugly and the cause of > the bug that was found. All-in-all, I'm not convinced that this is > really a good approach and feel it'd be better off implemented at a > different level, eg a node type instead of a hack on top of the existing > SRF code. Fwiw I've been mulling over the same questions and came to the conclusion that the existing approach makes the most sense. In an ideal world an extra executor node would be the prettiest, cleanest implementation. But the amount of extra code and busywork that would necessitate just isn't justified for the amount of work it would be doing. It might be different if WITH ORDINALITY made sense for any other types of target tables. But it really only makes sense for SRFs. The whole motivation for having them in the spec is that UNNEST is taking an ordered list and turning it into a relation which is unordered. You can't just do row_number() because there's nothing to make it ordered by. By the same token for any other data source you would just use row_number *precisely because* any other data source is unordered and you should have to specify an order in order to make row_number produce something meaningful. So given that WITH ORDINALITY is really only useful for UNNEST and we can generalize it to all SRFs on the basis that Postgres SRFs do produce ordered sets not unordered relations it isn't crazy for the work to be in the Function node. Now that I've written that though it occurs to me to wonder whether FDW tables might be usefully said to be ordered too though? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance_work_mem and CREATE INDEX time
On Tue, Jul 23, 2013 at 1:23 AM, Amit Langote wrote: > On Tue, Jul 23, 2013 at 1:11 PM, Amit Langote wrote: >> Hello, >> >> While understanding the effect of maintenance_work_mem on time taken >> by CREATE INDEX, I observed that for the values of >> maintenance_work_mem less than the value for which an internal sort is >> performed, the time taken by CREATE INDEX increases as >> maintenance_work_increases. >> >> My guess is that for all those values an external sort is chosen at >> some point and larger the value of maintenance_work_mem, later the >> switch to external sort would be made causing CREATE INDEX to take >> longer. That is a smaller value of maintenance_work_mem would be >> preferred for when external sort is performed anyway. Does that make >> sense? >> > > Upon further investigation, it is found that the delay to switch to > external sort caused by a larger value of maintenance_work_mem is > small compared to the total time of CREATE INDEX. If you are using trace_sort to report that, it reports the switch as happening as soon as it runs out of memory. At point, all we have been doing is reading tuples into memory. The time it takes to do that will depend on maintenance_work_mem, because that affects how many tuples fit in memory. But all the rest of the tuples need to be read sooner or later anyway, so pushing more of them to later doesn't improve things overall it just shifts timing around. After it reports the switch, it still needs to heapify the existing in-memory tuples before the tapesort proper can begin. This is where the true lost opportunities start to arise, as the large heap starts driving cache misses which would not happen at all under different settings. Once the existing tuples are heapified, it then continues to use the heap to pop tuples from it to write out to "tape", and to push newly read tuples onto it. This also suffers lost opportunities. Once all the tuples are written out and it starts merging, then the large maintenance_work_mem is no longer a penalty as the new heap is limited by the number of tapes, which is almost always much smaller. In fact this stage will actually be faster, but not by enough to make up for the earlier slow down. So it is not surprising that the time before the switch is reported is a small part of the overall time difference. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
Andrew, * Andrew Gierth (and...@tao11.riddles.org.uk) wrote: > Right, and we all know that all code is perfect when committed. sheesh. That clearly wasn't what was claimed. > (This is actually the first time in six months that I've had occasion > to look at that part of the code; that's how long it's been sitting in > the queue. While such issues are frustrating for all of us, huffing about it here isn't particularly useful. > And yes, it was one of my bits, not David's. Maybe I > should have left the bug in to see how long it took you to spot it?) That attitude is certainly discouraging. > What I'm very notably not seeing from you is any substantive feedback. > You've repeatedly described this patch as broken on the basis of > nothing more than garbled hearsay while loudly proclaiming not to have > actually looked at it; I find this both incomprehensible and insulting. As Greg is the one looking to possibly commit this, I certainly didn't consider his comments on the patch to be garbled hearsay. It would have been great if he had been more clear in his original comments, but I don't feel that you can fault any of us for reading his email and voicing what concerns we had from his review. While you might wish that we all read every patch submitted, none of us has time for that- simply keeping up with this mailing list requires a significant amount of time. > If you have legitimate technical concerns then let's hear them, now. Fine- I'd have been as happy leaving this be and letting Greg commit it, but if you'd really like to hear my concerns, I'd start with pointing out that it's pretty horrid to have to copy every record around like this and that the hack of CreateTupleDescCopyExtend is pretty terrible and likely to catch people by surprise. Having FunctionNext() operate very differently depending on WITH ORDINALITY is ugly and the cause of the bug that was found. All-in-all, I'm not convinced that this is really a good approach and feel it'd be better off implemented at a different level, eg a node type instead of a hack on top of the existing SRF code. Now, what would be great to see would be your response to Dean's comments and suggestions rather than berating someone who's barely written 5 sentences on this whole thread. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
Tom Lane said: > If you yourself are still finding bugs in the code as of this afternoon, > onlookers could be forgiven for doubting that the code is quite as > readable or ready to commit as all that. Right, and we all know that all code is perfect when committed. sheesh. (This is actually the first time in six months that I've had occasion to look at that part of the code; that's how long it's been sitting in the queue. And yes, it was one of my bits, not David's. Maybe I should have left the bug in to see how long it took you to spot it?) What I'm very notably not seeing from you is any substantive feedback. You've repeatedly described this patch as broken on the basis of nothing more than garbled hearsay while loudly proclaiming not to have actually looked at it; I find this both incomprehensible and insulting. If you have legitimate technical concerns then let's hear them, now. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Adding Zigzag Merge Join to Index Nested Loops Join
Hi. I want add Zigzag Merge join to Index Nested Loops Join alghoritm. http://www.cs.berkeley.edu/~fox/summaries/database/query_eval_5-8.html Which files are responsible for Index nested loops join ? (not simple nested loops join which has double foreach in nodeNestloop.c) nodeIndexscan.c? nodeIndexonlyscan.c? Best Regards Tom
Re: [HACKERS] getting rid of SnapshotNow
Robert Haas writes: > OK, so GetActiveSnapshot()? Works for me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
On Tue, Jul 23, 2013 at 2:24 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Jul 23, 2013 at 12:28 PM, Tom Lane wrote: >>> As far as get_actual_variable_range() is concerned, an MVCC snapshot >>> would probably be the thing to use anyway; > >> That surprises me, though. I really thought the point was to cost the >> index scan, and surely that will be slowed down even by entries we >> can't see. > > No, the usage (or the main usage anyway) is for selectivity estimation, > ie how many rows will the query fetch. OK, so GetActiveSnapshot()? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
Greg, > It's more than the available experienced reviewers are willing to chew > on fully as volunteers. The reward for spending review time is pretty > low right now. Short of paying for review time, I don't think we have another solution for getting the "big patches" reviewed, except to rely on the major contributors who are paid full-time to hack Postgres. You know as well as me that, as consultants, we can get clients to pay for 10% extra time for review in the course of developing a feature, but the kind of time which patches like Row Security, Changesets, or other "big patches" need nobody is going to pay for on a contract basis. And nobody who is doing this in their "spare time" has that kind of block. So I don't think there's any good solution for the "big patches". I do think our project could do much more to recruit reviewers for the small-medium patches, to take workload off the core contributors in general. Historically, however, this project (and the contributors on this list) has made material decisions not to encourage or recruit new people as reviewers, and has repeatedly stated that reviewers are not important. Until that changes, we are not going to get more reviewers (and I'm not going to have much sympathy for existing contributors who say they have no time for review). If we want more reviewers and people spending more time on review, then we need to give reviewers the *same* respect and the *same* rewards that feature contributors get. Not something else, the exact same. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
Robert Haas writes: > On Tue, Jul 23, 2013 at 12:28 PM, Tom Lane wrote: >> As far as get_actual_variable_range() is concerned, an MVCC snapshot >> would probably be the thing to use anyway; > That surprises me, though. I really thought the point was to cost the > index scan, and surely that will be slowed down even by entries we > can't see. No, the usage (or the main usage anyway) is for selectivity estimation, ie how many rows will the query fetch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance_work_mem and CREATE INDEX time
On Mon, Jul 22, 2013 at 9:11 PM, Amit Langote wrote: > Hello, > > While understanding the effect of maintenance_work_mem on time taken > by CREATE INDEX, I observed that for the values of > maintenance_work_mem less than the value for which an internal sort is > performed, the time taken by CREATE INDEX increases as > maintenance_work_increases. > > My guess is that for all those values an external sort is chosen at > some point and larger the value of maintenance_work_mem, later the > switch to external sort would be made causing CREATE INDEX to take > longer. That is a smaller value of maintenance_work_mem would be > preferred for when external sort is performed anyway. Does that make > sense? The heap structure used in external sorts is cache-unfriendly. The bigger the heap used, the more this unfriendliness becomes apparent. And the bigger maintenance_work_mem, the bigger the heap used. The bigger heap also means you have fewer "runs" to merge in the external sort. However, as long as the number of runs still fits in the same number of merge passes, this is generally not a meaningful difference. Ideally the planner (or something) would figure out how much memory would be needed to complete an external sort in just one external pass, and then lower the effective maintenance_work_mem to that amount. But that would be hard to do with complete precision, and the consequences of getting it wrong are asymmetric. (More thoroughly, it would figure out the number of passes needed for the given maintenance_work_mem, and then lower the effective maintenance_work_mem to the smallest value that gives the same number of passes. But for nearly all practical situations, I think the number of passes for an index build is going to be 0 or 1.) Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new "row-level lock" error messages
Alvaro Herrera wrote: > Peter Eisentraut wrote: > > > I would suggest that these changes be undone, except that the old > > "SELECT FOR ..." be replaced by a dynamic string that reverse-parses the > > LockingClause to provide the actual clause that was used. > > Here's a patch for this. Pushed to 9.3 and master. Sample output: alvherre=# select * from foo, bar for update of foof for share of bar; ERROR: relation "foof" in FOR UPDATE clause not found in FROM clause alvherre=# select * from foo, bar for update of foo for share of barf; ERROR: relation "barf" in FOR SHARE clause not found in FROM clause Amusingly, the only test in which these error messages appeared, in contrib/file_fdw, was removed after the two commits that changed the wording. So there's not a single test which needed to be tweaked for this change. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
On Tue, Jul 23, 2013 at 12:28 PM, Tom Lane wrote: > Robert Haas writes: >>> 2. snapshot-self-not-now-v1.patch changes several uses of SnapshotNow >>> to use SnapshotSelf instead. These include pgrowlocks(), >>> pgstat_heap(), and get_actual_variable_range(). > >> Tom proposed that we use SnapshotDirty for this case; let me just ask >> whether there are any security concerns around that. pgstattuple only >> displays aggregate information so I think that's OK, but I wonder if >> the value found in get_actual_variable_range() can leak out in EXPLAIN >> output or whatever. I can't particularly think of any reason why that >> would actually matter, but I've generally shied away from exposing >> data written by uncommitted transactions, and this would be a step in >> the other direction. Does this worry anyone else or am I being >> paranoid? > > As far as get_actual_variable_range() is concerned, an MVCC snapshot > would probably be the thing to use anyway; I see no need for the planner > to be using estimates that are "more up to date" than that. pgrowlocks > and pgstat_heap() might be in a different category. > >> But thinking about it a little more, I wonder why >> get_actual_variable_range() is using a snapshot at all. Presumably >> what we want there is to find the last index key, regardless of the >> visibility of the heap tuple to which it points. > > No, what we ideally want is to know the current variable range that > would be seen by the query being planned. Oh, really? Well, should we use GetActiveSnapshot() then? That surprises me, though. I really thought the point was to cost the index scan, and surely that will be slowed down even by entries we can't see. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On 7/23/13 12:10 PM, Josh Berkus wrote: Apparently it's a little much for experienced reviewers to chew on, too, since I've been trying to get someone to review it since the beginning of the Commitfest. It's more than the available experienced reviewers are willing to chew on fully as volunteers. The reward for spending review time is pretty low right now. While I understand the call for "resources", this is a bit unfair to KaiGai, who has put in his time reviewing other people's patches. If you read Dean Rasheed's comments, it's obvious he put a useful amount of work into his review suggestions. It is not the case here that KaiGai worked on a review and got nothing in return. Unfortunately that has happened to this particular patch before, but the community did a little better this time. The goal of the CF is usually to reach one of these outcomes for every submission: -The submission is ready for commit -The submission needs improvement in X Review here went far enough to identify an X to be improved. It was a big enough issue that a rewrite is needed, commit at this time isn't possible, and now KaiGai has something we hope is productive he can continue working on. That's all we can really promise here--that if we say something isn't ready for commit yet, that there's some feedback as to why. I would have preferred to see multiple X issues identified here, given that we know there has to be more than just the one in a submission of this size. The rough fairness promises of the CommitFest seem satisfied to me though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
Robert Haas writes: >> 2. snapshot-self-not-now-v1.patch changes several uses of SnapshotNow >> to use SnapshotSelf instead. These include pgrowlocks(), >> pgstat_heap(), and get_actual_variable_range(). > Tom proposed that we use SnapshotDirty for this case; let me just ask > whether there are any security concerns around that. pgstattuple only > displays aggregate information so I think that's OK, but I wonder if > the value found in get_actual_variable_range() can leak out in EXPLAIN > output or whatever. I can't particularly think of any reason why that > would actually matter, but I've generally shied away from exposing > data written by uncommitted transactions, and this would be a step in > the other direction. Does this worry anyone else or am I being > paranoid? As far as get_actual_variable_range() is concerned, an MVCC snapshot would probably be the thing to use anyway; I see no need for the planner to be using estimates that are "more up to date" than that. pgrowlocks and pgstat_heap() might be in a different category. > But thinking about it a little more, I wonder why > get_actual_variable_range() is using a snapshot at all. Presumably > what we want there is to find the last index key, regardless of the > visibility of the heap tuple to which it points. No, what we ideally want is to know the current variable range that would be seen by the query being planned. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY.
Andres Freund wrote: > Hm. There seems to be more things that need some more improvement > from a quick look. > > First, I have my doubts of the general modus operandy of > CONCURRENTLY here. I think in many, many cases it would be faster > to simply swap in the newly built heap + indexes in concurrently. > I think I have seen that mentioned in the review while mostly > skipping the discussion, but still. That would be easy enough as > of Robert's mvcc patch. Yeah, in many cases you would not want to choose to specify this technique. The point was to have a technique which could run without blocking while a heavy load of queries (including perhaps a very long-running query) was executing. If subsequent events provide an alternative way to do that, it's worth looking at it; but my hope was to get this out of the way to allow time to develop incremental maintenance of matviews for the next CF. If we spend too much time reworking this to micro-optimize it for new patches, incremental maintenance might not happen for 9.4. The bigger point is perhaps syntax. If MVCC catalogs are really at a point where we can substitute a new heap and new indexes for a matview without taking an AccessExclusiveLock, then we should just make the current code do that. I think there is still a place for a differential update for large matviews which only need small changes on a REFRESH, but perhaps the right term for that is not CONCURRENTLY. > * Shouldn't the GetUserIdAndSecContext(), NewGUCNestLevel() dance > be done a good bit earlier? We're executing queries before > that. This can't be in effect while we are creating temp tables. If we're going to support a differential REFRESH technique, it must be done more "surgically" than to wrap the whole REFRESH statement execution. > * The loop over indexes in refresh_by_match_merge should > index_open(ExclusiveLock) the indexes initially instead of > searching the syscache manually. They are opened inside the > loop in many cases anyway. There probably aren't any hazards > currently, but I am not even sure about that. The index_open() > in the loop at the very least processes the invalidation > messages other backends send... Fair enough. That seems like a simple change. > I'd even suggest using BuildIndexInfo() or such on the indexes, > then you could use ->ii_Expressions et al instead of peeking > into indkeys by hand. Given that the function is in a source file described as containing "code to create and destroy POSTGRES index relations" and the comments for that function say that it "stores the information about the index that's needed by FormIndexDatum, which is used for both index_build() and later insertion of individual index tuples," and we're not going to create or destroy an index or call FormIndexDatum or insert individual index tuples from this code, it would seem to be a significant expansion of the charter of that function. What benefits do you see to using that level? > * All not explicitly looked up operators should be qualified > using OPERATOR(). It's easy to define your own = operator for > tid and set the search_path to public,pg_catalog. Now, the > whole thing is restricted to talbe owners afaics, making this > decidedly less dicey, but still. But if anyobdy actually uses a > search_path like the above you can easily hurt them. > * Same seems to hold true for the y = x and y.* IS DISTINCT FROM > x.* operations. > * (y.*) = (x.*) also needs to do proper operator lookups. I wasn't aware that people could override the equality operators for tid and RECORD, so that seemed like unnecessary overhead. I can pull the operators from the btree AM if people can do that. > * OpenMatViewIncrementalMaintenance() et al seem to be > underdocumented. If the adjacent comment for MatViewIncrementalMaintenanceIsEnabled() left you at all uncertain about what OpenMatViewIncrementalMaintenance() and CloseMatViewIncrementalMaintenance() are for, I can add comments. At the time I wrote it, it seemed to me to be redundant to have such comments, and would cause people to waste more time looking at them then would be saved by anything they could add to what the function names and one-line function bodies conveyed; but if there was doubt in your mind about when they should be called, I'll add something. Would it help to move the static functions underneath the (documented) public function and its comment? > * why is it even allowed that matview_maintenance_depth is > 1? > Unless I miss something there doesn't seem to be support for a > recursive refresh whatever that would mean? I was trying to create the function in a way that would be useful for incremental maintenance, too. Those could conceivably recurse. Also, this way bugs in usage are more easily detected. > * INSERT and DELETE should also alias the table names, to make > sure there cannot be issues with the nested queries. I don't see the hazard -- could you elabor
Re: [HACKERS] Comma Comma Comma 8601
On Jul 23, 2013, at 1:17 AM, Tom Lane wrote: > Does that create any ambiguities against formats we already support? > I'm worried about examples like this one: > > select 'monday, july 22, 22:30 2013'::timestamptz; > timestamptz > > 2013-07-22 22:30:00-04 > (1 row) > > Right now I'm pretty sure that the datetime parser treats comma as a > noise symbol. If that stops being true, we're likely to break some > applications out there (admittedly, possibly rather strange ones, > but still ...) I kind of suspect not, since this fails: david=# select '12:24:53 654'::time; ERROR: invalid input syntax for type time: "12:24:53 654" LINE 1: select '12:24:53 654'::time; ^ I would have guessed that the time parser gets to a state where it knows it is dealing with a ISO-8601-style time. But I have not looked at the code, of course. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 CF 1] The Commitfest Slacker List
Greg, > As far as motivating new reviewers goes, let's talk about positive > feedback. Anything that complicates the release notes is a non-starter > because that resource is tightly controlled by a small number of people, > and it's trying to satisfy a lot of purposes. Greg, you're re-arguing something we obtained a consensus on a week ago. Please read the thread "Kudos for reviewers". -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Design proposal: fsync absorb linear slider
On 7/23/13 10:56 AM, Robert Haas wrote: On Mon, Jul 22, 2013 at 11:48 PM, Greg Smith wrote: We know that a 1GB relation segment can take a really long time to write out. That could include up to 128 changed 8K pages, and we allow all of them to get dirty before any are forced to disk with fsync. By my count, it can include up to 131,072 changed 8K pages. Even better! I can pinpoint exactly what time last night I got tired enough to start making trivial mistakes. Everywhere I said 128 it's actually 131,072, which just changes the range of the GUC I proposed. Getting the number right really highlights just how bad the current situation is. Would you expect the database to dump up to 128K writes into a file and then have low latency when it's flushed to disk with fsync? Of course not. But that's the job the checkpointer process is trying to do right now. And it's doing it blind--it has no idea how many dirty pages might have accumulated before it started. I'm not exactly sure how best to use the information collected. fsync every N writes is one approach. Another is to use accumulated writes to predict how long fsync on that relation should take. Whenever I tried to spread fsync calls out before, the scale of the piled up writes from backends was the input I really wanted available. The segment write count gives an alternate way to sort the blocks too, you might start with the heaviest hit ones. In all these cases, the fundamental I keep coming back to is wanting to cue off past write statistics. If you want to predict relative I/O delay times with any hope of accuracy, you have to start the checkpoint knowing something about the backend and background writer activity since the last one. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On 07/22/2013 01:27 PM, Greg Smith wrote: > > Anyone who would like to see RLS committed should consider how you can > get resources to support a skilled PostgreSQL reviewer spending time on > it. (This is a bit much to expect new reviewers to chew on usefully) > I've been working on that here, but I don't have anything I can publicly > commit to yet. Apparently it's a little much for experienced reviewers to chew on, too, since I've been trying to get someone to review it since the beginning of the Commitfest. While I understand the call for "resources", this is a bit unfair to KaiGai, who has put in his time reviewing other people's patches. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bloom Filter lookup for hash joins
On Tue, Jul 23, 2013 at 7:32 PM, Greg Stark wrote: > This exact idea was discussed a whole back. I think it was even implemented. > > The problem Tom raised at the time is that the memory usage of the bloom > filter implies smaller or less efficient hash table. It's difficult to > determine whether you're coming out ahead or behind. > > I think it should be possible to figure this out though. Bloom fillers have > well understood math for the error rate given the size and number of hash > functions (and please read up on it and implement the optimal combination > for the target error rate, not just an wag) and it should be possible to > determine the resulting advantage. > > Measuring the cost of the memory usage is harder but some empirical results > should give some idea. I expect the result to be wildly uneven one way or > the other so hopefully it doesn't matter of its not perfect. If it's close > then probably is not worth doing anyways. > > I would suggest looking up the archives of the previous discussion. You mind > find the patch still usable. Iirc there's been no major changes to the hash > join code. > Right, I will definitely have a look on the thread. Thanks for the info! Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] getting rid of SnapshotNow
On Thu, Jul 18, 2013 at 8:46 AM, Robert Haas wrote: > There seems to be a consensus that we should try to get rid of > SnapshotNow entirely now that we have MVCC catalog scans, so I'm > attaching two patches that together come close to achieving that goal: > > 1. snapshot-error-v1.patch introduces a new special snapshot, called > SnapshotError. In the cases where we set SnapshotNow as a sort of > default snapshot, this patch changes the code to use SnapshotError > instead. This affects scan->xs_snapshot in genam.c and > estate->es_snapshot in execUtils.c. This passes make check-world, so > apparently there is no code in the core distribution that does this. > However, this is safer for third-party code, which will ERROR instead > of seg faulting. The alternative approach would be to use > InvalidSnapshot, which I think would be OK too if people dislike this > approach. It seems the consensus was mildly for InvalidSnapshot, so I did it that way. > 2. snapshot-self-not-now-v1.patch changes several uses of SnapshotNow > to use SnapshotSelf instead. These include pgrowlocks(), > pgstat_heap(), and get_actual_variable_range(). In all of those > cases, only an approximately-correct answer is needed, so the change > should be fine. I'd also generally expect that it's very unlikely for > any of these things to get called in a context where the table being > scanned has been updated by the current transaction after the most > recent command-counter increment, so in practice the change in > semantics will probably not be noticeable at all. Tom proposed that we use SnapshotDirty for this case; let me just ask whether there are any security concerns around that. pgstattuple only displays aggregate information so I think that's OK, but I wonder if the value found in get_actual_variable_range() can leak out in EXPLAIN output or whatever. I can't particularly think of any reason why that would actually matter, but I've generally shied away from exposing data written by uncommitted transactions, and this would be a step in the other direction. Does this worry anyone else or am I being paranoid? But thinking about it a little more, I wonder why get_actual_variable_range() is using a snapshot at all. Presumably what we want there is to find the last index key, regardless of the visibility of the heap tuple to which it points. We don't really need to consult the heap at all, one would think; the value we need ought to be present in the index tuple. If we're going to use a snapshot for simplicity of coding, maybe the right thing is SnapshotAny. After all, even if the index tuples are all dead, we still have to scan them, so it's still relevant for costing purposes. Thoughts? > With that done, the only remaining uses of SnapshotNow in our code > base will be in currtid_byreloid() and currtid_byrelname(). So far no > one on this list has been able to understand clearly what the purpose > of those functions is, so I'm copying this email to pgsql-odbc in case > someone there can provide more insight. If I were a betting man, I'd > bet that they are used in contexts where the difference between > SnapshotNow and SnapshotSelf wouldn't matter there, either. For > example, if those functions are always invoked in a query that does > nothing but call those functions, the difference wouldn't be visible. > If we don't want to risk any change to the semantics, we can (1) grit > our teeth and keep SnapshotNow around or (2) use an instantaneous MVCC > snapshot there, and accept that people who have very large connection > counts and extremely heavy use of those functions may see a > performance regression. It seems like we're leaning toward a fresh MVCC snapshot for this case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Design proposal: fsync absorb linear slider
On Mon, Jul 22, 2013 at 11:48 PM, Greg Smith wrote: > Recently I've been dismissing a lot of suggested changes to checkpoint fsync > timing without suggesting an alternative. I have a simple one in mind that > captures the biggest problem I see: that the number of backend and > checkpoint writes to a file are not connected at all. > > We know that a 1GB relation segment can take a really long time to write > out. That could include up to 128 changed 8K pages, and we allow all of > them to get dirty before any are forced to disk with fsync. By my count, it can include up to 131,072 changed 8K pages. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 CF 1] And then there were 5
On Tue, Jul 23, 2013 at 9:22 AM, Greg Smith wrote: > On 7/23/13 9:06 AM, Robert Haas wrote: >> >> But just BTW, you kind of messed up that CommitFest entry. The link you >> added as a new version of the >> patch was actually to a different patch from the same flock. > > Fixed now, since I find myself looking back through history on submissions > often enough to care about it being right. My eyes were starting to glaze > over by the end of staring at this "flock". Cool. I'm pretty sure that's the correct collective noun for patches. Right? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improve Chinese locale performance
On Tue, Jul 23, 2013 at 9:42 AM, Craig Ringer wrote: > (Replying on phone, please forgive bad quoting) > > Isn't this pretty much what adopting ICU is supposed to give us? > OS-independent collations? Yes. > I'd be interested in seeing the rest data for this performance report, partly > as I'd like to see how ICU collations would compare when ICU is crudely > hacked into place for testing. I pretty much lost interest in ICU upon reading that they use UTF-16 as their internal format. http://userguide.icu-project.org/strings#TOC-Strings-in-ICU What that would mean for us is that instead of copying the input strings into a temporary buffer and passing the buffer to strcoll(), we'd need to convert them to ICU's representation (which means writing twice as many bytes as the length of the input string in cases where the input string is mostly single-byte characters) and then call ICU's strcoll() equivalent. I agree that it might be worth testing, but I can't work up much optimism. It seems to me that something that operates directly on the server encoding could run a whole lot faster. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Tuesday, July 23, 2013 7:06 PM Andres Freund wrote: > On 2013-07-23 18:59:11 +0530, Amit Kapila wrote: > > > * I'd be very surprised if this doesn't make WAL replay of update > heavy > > > workloads slower by at least factor of 2. > > > > Yes, if you just consider the cost of replay, but it involves > other > > operations as well > > like for standby case transfer of WAL, Write of WAL, Read from > WAL and > > then apply. > > So among them most operation's will be benefited from reduced WAL > size, > > except apply where you need to decode. > > I still think it's rather unlikely that they offset those. I've seen > wal > replay be a major bottleneck more than once... > > > > * It makes data recovery from WAL *noticeably* harder since data > > > corruption now is carried forwards and you need the old data to > > > decode > > > new data > > > >This is one of the reasons why this optimization is done only when > the > > new row goes in same page. > > That doesn't help all that much. It somewhat eases recovering data if > full_page_writes are on, but it's realy hard to stitch together all > changes if the corruption occured within a 1h long checkpoint... I think once a record is corrupted on page, user has to reconstruct that page, it might be difficult to just reconstruct a record and this optimization will not carry forward any corruption from one to another Page. > > > * It makes changeset extraction either more expensive or it would > have > > > to be disabled there. > > > I think, if there is any such implication, we can probably have > the > > option of disable it > > That can just be done on wal_level = logical, that's not the > problem. It's certainly not with precedence that we have wal_level > dependent optimizations. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bloom Filter lookup for hash joins
This exact idea was discussed a whole back. I think it was even implemented. The problem Tom raised at the time is that the memory usage of the bloom filter implies smaller or less efficient hash table. It's difficult to determine whether you're coming out ahead or behind. I think it should be possible to figure this out though. Bloom fillers have well understood math for the error rate given the size and number of hash functions (and please read up on it and implement the optimal combination for the target error rate, not just an wag) and it should be possible to determine the resulting advantage. Measuring the cost of the memory usage is harder but some empirical results should give some idea. I expect the result to be wildly uneven one way or the other so hopefully it doesn't matter of its not perfect. If it's close then probably is not worth doing anyways. I would suggest looking up the archives of the previous discussion. You mind find the patch still usable. Iirc there's been no major changes to the hash join code. -- greg On Jun 26, 2013 7:31 PM, "Jeff Janes" wrote: > On Wed, Jun 26, 2013 at 5:01 AM, Atri Sharma wrote: > >> >> > The problem here is that if the hash table is in memory, doing a hash >> > table lookup directly is likely to be cheaper than a bloom filter >> > lookup, even if the bloom filter fits into the processor cache and the >> > hash table doesn't (10 last level cache hits is slower than one cache >> > miss). Bloom filter will help when its not feasible to use an actual >> > hash table (lots of large keys), the real lookup is slow (e.g. an >> > index lookup), you are doing a lot of lookups to amortize the >> > construction cost and the condition is expected to be selective (most >> > lookups give a negative). There might be some dataware house types of >> > queries where it would help, but it seems like an awfully narrow use >> > case with a potential for performance regressions when the planner has >> > a bad estimate. >> >> Ok, sounds good. Cant we use bloom filters for the case where the hash >> table doesnt fit in memory? Specifically, when reading from disk is >> inevitable for accessing the hash table, we can use bloom filters for >> deciding which keys to actually read from disk. > > > I don't think that sounds all that promising. When the hash table does > not fit in memory, it is either partitioned into multiple passes, each of > which do fit in memory, or it chooses a different plan altogether. Do we > know under what conditions a Bloom filter would be superior to those > options, and could we reliably detect those conditions? > > Cheers, > > Jeff >
Re: [HACKERS] Performance problem in PLPgSQL
Marc Cousin writes: > The example below is of course extremely simplified, and obviously not > what we are really doing in the database, but it exhibits the slowdown > between 9.1.9 and 9.2.4. Hm. Some experimentation shows that the plan cache is failing to switch to a generic plan, but I'm not sure why the cast would have anything to do with that ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make --silent
On 2013-07-23 09:27:18 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2013-07-23 08:38:03 -0400, Peter Eisentraut wrote: > >> In file included from gram.y:13612:0: > >> scan.c: In function ‘yy_try_NUL_trans’: > >> scan.c:10181:23: warning: unused variable ‘yyg’ [-Wunused-variable] > >> PostgreSQL, contrib, and documentation successfully made. Ready to install. > > > FWIW, I've patched debian's flex just to get rid of this ;) > > I think it's fixed officially as of flex 2.5.35. Unfortunately (because debian has 2.5.35) it seems to be 2.5.36. The former was released 2008 :( It's 62617a3183abb2945beedc0b0ff106182af4f266 in flex's repository if somebody wants to submit it to debian. The repository is at git://git.code.sf.net/p/flex/flex - their website stated a different url the last time I looked. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improve Chinese locale performance
(Replying on phone, please forgive bad quoting) Isn't this pretty much what adopting ICU is supposed to give us? OS-independent collations? I'd be interested in seeing the rest data for this performance report, partly as I'd like to see how ICU collations would compare when ICU is crudely hacked into place for testing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On 2013-07-23 18:59:11 +0530, Amit Kapila wrote: > > * I'd be very surprised if this doesn't make WAL replay of update heavy > > workloads slower by at least factor of 2. > > Yes, if you just consider the cost of replay, but it involves other > operations as well > like for standby case transfer of WAL, Write of WAL, Read from WAL and > then apply. > So among them most operation's will be benefited from reduced WAL size, > except apply where you need to decode. I still think it's rather unlikely that they offset those. I've seen wal replay be a major bottleneck more than once... > > * It makes data recovery from WAL *noticeably* harder since data > > corruption now is carried forwards and you need the old data to > > decode > > new data > >This is one of the reasons why this optimization is done only when the > new row goes in same page. That doesn't help all that much. It somewhat eases recovering data if full_page_writes are on, but it's realy hard to stitch together all changes if the corruption occured within a 1h long checkpoint... > > * It makes changeset extraction either more expensive or it would have > > to be disabled there. > I think, if there is any such implication, we can probably have the > option of disable it That can just be done on wal_level = logical, that's not the problem. It's certainly not with precedence that we have wal_level dependent optimizations. > > I think my primary issue is that philosophically/architecturally I am > > of > > the opinion that a wal record should make sense of it's own without > > depending on heap data. And this patch looses that. > > Is the main worry about corruption getting propagated? Not really. It "feels" wrong to me architecturally. That's subjective, I know. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make --silent
On 2013-07-23 09:21:54 -0400, Stephen Frost wrote: > * Andres Freund (and...@2ndquadrant.com) wrote: > > On 2013-07-23 08:38:03 -0400, Peter Eisentraut wrote: > > > Writing postgres.bki > > > Writing schemapg.h > > > Writing postgres.description > > > Writing postgres.shdescription > > > Writing fmgroids.h > > > Writing fmgrtab.c > > > > I personally don't feel the need for those to go away. They only appear > > when doing a clean rebuild and/or when changing something > > significant. Also it's just a couple of lines. > > For my 2c, I'd be for getting rid of the above. I agree it's not a huge > deal, but I'm also a big fan of "don't spam me when things go right". > Is there any particular reason we need to see them with every (clean) > build..? I don't have a big problem with getting rid of them either. I find them moderately reassuring to see those triggered if I change something relevant because I don't fully trust the dependency mechanism around them, but that's about it. > > > In file included from gram.y:13612:0: > > > scan.c: In function ‘yy_try_NUL_trans’: > > > scan.c:10181:23: warning: unused variable ‘yyg’ [-Wunused-variable] > > > PostgreSQL, contrib, and documentation successfully made. Ready to > > > install. > > > > FWIW, I've patched debian's flex just to get rid of this ;) > > Any idea if that might be accepted upstream or perhaps by Debian..? It > would surely be nice to make that error go away.. I think Tom ushered a patch upstream, debian just hasn't updated in a good while, even in unstable. I've looked at the packages bug page some time back and it didn't look very active so I didn't think further about proposing a debian-only backport of that patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestion for concurrent index creation using a single full scan operation
On 2013-07-23 14:17:13 +0100, Greg Stark wrote: > We already do this in pg_restore by starting multiple worker processes. > Those processes should get the benefit of synchronised sequential scans. > > The way the api for indexes works y wouldn't really be hard to start > multiple parallel index builds. I'm not sure how well the pg_restore thing > works and sometimes the goal isn't to maximise the speed so starting > multiple processes isn't always ideal. It might sometimes be interesting > to be able to do it explicit in a single process. That's true for normal index modifications, but for ambuild (the initial index creation function) much less so. That's pretty much a black box from the outside. It's possible to change that, but it's certainly not trivial. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Tuesday, July 23, 2013 12:27 AM Andres Freund wrote: > On 2013-07-19 10:40:01 +0530, Hari Babu wrote: > > > > On Friday, July 19, 2013 4:11 AM Greg Smith wrote: > > >On 7/9/13 12:09 AM, Amit Kapila wrote: > > >>I think the first thing to verify is whether the results posted > can be validated in some other environment setup by another person. > > >>The testcase used is posted at below link: > > >>http://www.postgresql.org/message- > id/51366323.8070...@vmware.com > > > > >That seems easy enough to do here, Heikki's test script is > excellent. > > >The latest patch Hari posted on July 2 has one hunk that doesn't > apply > > >anymore now. > > > > The Head code change from Heikki is correct. > > During the patch rebase to latest PG LZ optimization code, the above > code change is missed. > > > > Apart from the above changed some more changes are done in the patch, > those are. > > FWIW I don't like this approach very much: > > * I'd be very surprised if this doesn't make WAL replay of update heavy > workloads slower by at least factor of 2. Yes, if you just consider the cost of replay, but it involves other operations as well like for standby case transfer of WAL, Write of WAL, Read from WAL and then apply. So among them most operation's will be benefited from reduced WAL size, except apply where you need to decode. > * It makes data recovery from WAL *noticeably* harder since data > corruption now is carried forwards and you need the old data to > decode > new data This is one of the reasons why this optimization is done only when the new row goes in same page. > * It makes changeset extraction either more expensive or it would have > to be disabled there. I think, if there is any such implication, we can probably have the option of disable it > I think my primary issue is that philosophically/architecturally I am > of > the opinion that a wal record should make sense of it's own without > depending on heap data. And this patch looses that. Is the main worry about corruption getting propagated? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make --silent
Andres Freund writes: > On 2013-07-23 08:38:03 -0400, Peter Eisentraut wrote: >> I have noticed that some people post examples using make --silent (-s). >> I found this actually kind of neat to use from time to time, because >> then you only see output if you have warnings or errors. But we get >> some extra output that doesn't quite fit: >> Writing postgres.bki >> Writing schemapg.h >> Writing postgres.description >> Writing postgres.shdescription >> Writing fmgroids.h >> Writing fmgrtab.c > I personally don't feel the need for those to go away. I agree --- these let you know that something happened that does not happen every build, so they're kind of useful. I concur that the noise from doc building is just noise, though. >> In file included from gram.y:13612:0: >> scan.c: In function âyy_try_NUL_transâ: >> scan.c:10181:23: warning: unused variable âyygâ [-Wunused-variable] >> PostgreSQL, contrib, and documentation successfully made. Ready to install. > FWIW, I've patched debian's flex just to get rid of this ;) I think it's fixed officially as of flex 2.5.35. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 CF 1] And then there were 5
On 7/23/13 9:06 AM, Robert Haas wrote: But just BTW, you kind of messed up that CommitFest entry. The link you added as a new version of the patch was actually to a different patch from the same flock. Fixed now, since I find myself looking back through history on submissions often enough to care about it being right. My eyes were starting to glaze over by the end of staring at this "flock". -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make --silent
* Andres Freund (and...@2ndquadrant.com) wrote: > On 2013-07-23 08:38:03 -0400, Peter Eisentraut wrote: > > Writing postgres.bki > > Writing schemapg.h > > Writing postgres.description > > Writing postgres.shdescription > > Writing fmgroids.h > > Writing fmgrtab.c > > I personally don't feel the need for those to go away. They only appear > when doing a clean rebuild and/or when changing something > significant. Also it's just a couple of lines. For my 2c, I'd be for getting rid of the above. I agree it's not a huge deal, but I'm also a big fan of "don't spam me when things go right". Is there any particular reason we need to see them with every (clean) build..? > > Processing HTML.index... > > 2406 entries loaded... > > 0 entries ignored... > > Done. > > Note: Writing man3/SPI_connect.3 > > Note: Writing man3/SPI_finish.3 > > Note: Writing man3/SPI_push.3 > > ... > > Note: Writing man1/pg_test_timing.1 > > Note: Writing man1/pg_upgrade.1 > > Note: Writing man1/pg_xlogdump.1 > > Those should imo be silenced. It's quite a bit of output, and especially > during a parallel build they tend to hide more important output. Agreed. > > In file included from gram.y:13612:0: > > scan.c: In function ‘yy_try_NUL_trans’: > > scan.c:10181:23: warning: unused variable ‘yyg’ [-Wunused-variable] > > PostgreSQL, contrib, and documentation successfully made. Ready to install. > > FWIW, I've patched debian's flex just to get rid of this ;) Any idea if that might be accepted upstream or perhaps by Debian..? It would surely be nice to make that error go away.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Suggestion for concurrent index creation using a single full scan operation
We already do this in pg_restore by starting multiple worker processes. Those processes should get the benefit of synchronised sequential scans. The way the api for indexes works y wouldn't really be hard to start multiple parallel index builds. I'm not sure how well the pg_restore thing works and sometimes the goal isn't to maximise the speed so starting multiple processes isn't always ideal. It might sometimes be interesting to be able to do it explicit in a single process. -- greg On Jul 23, 2013 1:06 PM, "Tim Kane" wrote: > Hi all, > > I haven't given this a lot of thought, but it struck me that when > rebuilding tables (be it for a restore process, or some other operational > activity) - there is more often than not a need to build an index or two, > sometimes many indexes, against the same relation. > > It strikes me that in order to build just one index, we probably need to > perform a full table scan (in a lot of cases). If we are building > multiple indexes sequentially against that same table, then we're probably > performing multiple sequential scans in succession, once for each index. > > Could we architect a mechanism that allowed multiple index creation > statements to execute concurrently, with all of their inputs fed directly > from a single sequential scan against the full relation? > > From a language construct point of view, this may not be trivial to > implement for raw/interactive SQL - but possibly this is a candidate for > the custom format restore? > > I presume this would substantially increase the memory overhead required > to build those indexes, though the performance gains may be advantageous. > > Feel free to shoot holes through this :) > > Apologies in advance if this is not the correct forum for suggestions.. > >
Re: [HACKERS] [9.4 CF 1] And then there were 5
On Tue, Jul 23, 2013 at 12:04 AM, Greg Smith wrote: > A further look at recently ALTER OPERATOR FAMILY changes shows that Robert > has been working on those quite a bit. I'm putting him down as the > committer on this last one. I missed that one, now committed. But just BTW, you kind of messed up that CommitFest entry. The link you added as a new version of the patch was actually to a different patch from the same flock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade across more than two neighboring major releases
On Tue, Jul 23, 2013 at 09:48:16PM +0900, Michael Paquier wrote: > > > > On Tue, Jul 23, 2013 at 8:53 PM, Pavel Raiskup wrote: > > Is pg_upgrade supposed to work even across multiple major releases? (I > know > that the README.rpm-dist is mentioning just one major release step, but it > could be bound just to actual pg_upgrade which is packaged..) > > Yes you should be able to do a one-step upgrade as long as you update from at > least a 8.3 server. There are some restrictions depending on the server > version > that is going to be upgraded though, you can have a look at the documentation > for more details here: > http://www.postgresql.org/docs/9.2/static/pgupgrade.html Yes, you can easily skip many major versions in on pg_upgrade run. Please show us the errors you see. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add more regression tests for ALTER OPERATOR FAMILY.. ADD / DROP
On Thu, May 23, 2013 at 6:52 PM, Robins wrote: > Please find attached a patch to take code-coverage of ALTER OPERATOR > FAMILY.. ADD / DROP (src/backend/commands/opclasscmds.c) from 50% to 87%. > > Any and all feedback is welcome. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make --silent
On 2013-07-23 08:38:03 -0400, Peter Eisentraut wrote: > I have noticed that some people post examples using make --silent (-s). > I found this actually kind of neat to use from time to time, because > then you only see output if you have warnings or errors. But we get > some extra output that doesn't quite fit: I am one of the people doing that regularly. I think it's fantastic because it's so much harder to miss errors that way... Also it shortens the buildtime measurably. > Writing postgres.bki > Writing schemapg.h > Writing postgres.description > Writing postgres.shdescription > Writing fmgroids.h > Writing fmgrtab.c I personally don't feel the need for those to go away. They only appear when doing a clean rebuild and/or when changing something significant. Also it's just a couple of lines. > Processing HTML.index... > 2406 entries loaded... > 0 entries ignored... > Done. > Note: Writing man3/SPI_connect.3 > Note: Writing man3/SPI_finish.3 > Note: Writing man3/SPI_push.3 > ... > Note: Writing man1/pg_test_timing.1 > Note: Writing man1/pg_upgrade.1 > Note: Writing man1/pg_xlogdump.1 Those should imo be silenced. It's quite a bit of output, and especially during a parallel build they tend to hide more important output. > In file included from gram.y:13612:0: > scan.c: In function ‘yy_try_NUL_trans’: > scan.c:10181:23: warning: unused variable ‘yyg’ [-Wunused-variable] > PostgreSQL, contrib, and documentation successfully made. Ready to install. FWIW, I've patched debian's flex just to get rid of this ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY
On 23 July 2013 06:10, Tom Lane wrote: > Andrew Gierth writes: >> I must admit to finding all of this criticism of unread code a bit >> bizarre. > > If you yourself are still finding bugs in the code as of this afternoon, > onlookers could be forgiven for doubting that the code is quite as > readable or ready to commit as all that. > I had another look at this --- the bug fix looks reasonable, and includes a sensible set of additional regression tests. This was not a bug that implies anything fundamentally wrong with the patch. Rather it was just a fairly trivial easy-to-overlook bug of omission --- I certainly overlooked it in my previous reviews (sorry) and at least 3 more experienced hackers than me overlooked it during detailed code inspection. I don't think that really reflects negatively on the quality of the patch, or the approach taken, which I still think is good. That's also backed up by the fact that Greg isn't able to find much he wants to change. I also don't see much that needs changing in the patch, except possibly in the area where Greg expressed concerns over the comments and code clarity. I had similar concerns during my inital review, so I tend to agree that perhaps it's worth adding a separate boolean has_ordinality flag to FunctionScanState just to improve code readability. FWIW, I would probably have extended FunctionScanState like so: /* * FunctionScanState information * * Function nodes are used to scan the results of a * function appearing in FROM (typically a function returning set). * * eflags node's capability flags * tupdesc node's expected return tuple description * tuplestorestate private state of tuplestore.c * funcexprstate for function expression being evaluated * has_ordinality function scan WITH ORDINALITY? * func_tupdescfor WITH ORDINALITY, the raw function tuple * description, without the added ordinality column * func_slot for WITH ORDINALITY, a slot for the raw function * return tuples * ordinal for WITH ORDINALITY, the ordinality of the return * tuple * */ typedef struct FunctionScanState { ScanState ss; /* its first field is NodeTag */ int eflags; TupleDesc tupdesc; Tuplestorestate *tuplestorestate; ExprState *funcexpr; boolhas_ordinality; /* these fields are used for a function scan WITH ORDINALITY */ TupleDesc func_tupdesc; TupleTableSlot *func_slot; int64 ordinal; } FunctionScanState; Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade across more than two neighboring major releases
On Tue, Jul 23, 2013 at 8:53 PM, Pavel Raiskup wrote: > Is pg_upgrade supposed to work even across multiple major releases? (I > know > that the README.rpm-dist is mentioning just one major release step, but it > could be bound just to actual pg_upgrade which is packaged..) > Yes you should be able to do a one-step upgrade as long as you update from at least a 8.3 server. There are some restrictions depending on the server version that is going to be upgraded though, you can have a look at the documentation for more details here: http://www.postgresql.org/docs/9.2/static/pgupgrade.html Thanks, -- Michael
Re: [HACKERS] proposal - psql - show longest tables
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Even if we thought the functionality was worth the trouble, which I > continue to doubt, this particular syntax proposal is a disaster. Agreed. While there might be things worthwhile to add to psql's backslash commands, this isn't one of those. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] make --silent
I have noticed that some people post examples using make --silent (-s). I found this actually kind of neat to use from time to time, because then you only see output if you have warnings or errors. But we get some extra output that doesn't quite fit: Writing postgres.bki Writing schemapg.h Writing postgres.description Writing postgres.shdescription Writing fmgroids.h Writing fmgrtab.c Processing HTML.index... 2406 entries loaded... 0 entries ignored... Done. Note: Writing man3/SPI_connect.3 Note: Writing man3/SPI_finish.3 Note: Writing man3/SPI_push.3 ... Note: Writing man1/pg_test_timing.1 Note: Writing man1/pg_upgrade.1 Note: Writing man1/pg_xlogdump.1 In file included from gram.y:13612:0: scan.c: In function ‘yy_try_NUL_trans’: scan.c:10181:23: warning: unused variable ‘yyg’ [-Wunused-variable] PostgreSQL, contrib, and documentation successfully made. Ready to install. Is there interest in silencing this extra output (not the compiler warning) when make -s is used? The documentation tools have options to do so, the rest are our own scripts. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 CF 1] And then there were 5
Greg, First, thanks for helping out with the CF. * Greg Smith (g...@2ndquadrant.com) wrote: > That leaves only 1 patch where I have no idea who will commit it: > > access to calls stack from GET DIAGNOSTICS statement I'll take a look at this. Thanks again, Stephen signature.asc Description: Digital signature
Re: [HACKERS] improve Chinese locale performance
On Mon, Jul 22, 2013 at 12:49 PM, Greg Stark wrote: > On Mon, Jul 22, 2013 at 12:50 PM, Peter Eisentraut wrote: >> I think part of the problem is that we call strcoll for each comparison, >> instead of doing strxfrm once for each datum and then just strcmp for >> each comparison. That is effectively equivalent to what the proposal >> implements. > > Fwiw I used to be a big proponent of using strxfrm. But upon further > analysis I realized it was a real difficult tradeoff. strxrfm saves > potentially a lot of cpu cost but at the expense of expanding the size > of the sort key. If the sort spills to disk or even if it's just > memory bandwidth limited it might actually be slower than doing the > additional cpu work of calling strcoll. > > It's hard to see how to decide in advance which way will be faster. I > suspect strxfrm is still the better bet, especially for complex large > character set based locales like Chinese. strcoll might still win by a > large margin on simple mostly-ascii character sets. The storage blow-up on systems I've tested is on the order of 10x. That's possibly fine if the data still fits in memory, but it pretty much sucks if it makes your sort spill to disk, which seems like a likely outcome in many cases. But I don't have much trouble believing the OP's contention that he's coded a locale-specific version that is faster than the version that ships with the OS. On glibc, for example, we copy the strings we want to compare, so that we can add a terminating zero byte. The first thing that glibc does is call strlen(). That's pretty horrible, and I'm not at all sure the horror ends there, either. It would be great to have support for user-defined collations in PostgreSQL. Let the user provide their own comparison function and whatever else is needed and use that instead of the OS-specific support. Aside from the performance advantages, one could even create collations that have the same names and orderings on all platforms we support. Our support team has gotten more than one inquiry of the form "what's the equivalent of Linux collation XYZ on Windows?" - and telling them that there is no exact equivalent is not the answer the want to hear. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Tuesday, July 23, 2013 12:02 AM Greg Smith wrote: > The v3 patch applies perfectly here now. Attached is a spreadsheet > with test results from two platforms, a Mac laptop and a Linux server. > I used systems with high disk speed because that seemed like a worst > case for this improvement. The actual improvement for shrinking WAL > should be even better on a system with slower disks. You are absolutely right. To mimic it on our system, by configuring RAMFS for database, it shows similar results. > There are enough problems with the "hundred tiny fields" results that I > think this not quite ready for commit yet. More comments on that > below. > This seems close though, close enough that I am planning to follow up > to see if the slow disk results are better. Thanks for going extra mile to try for slower disks. > Reviewing the wal-update-testsuite.sh test program, I think the only > case missing that would be useful to add is "ten tiny fields, one > changed". I think that one is interesting to highlight because it's > what benchmark programs like pgbench do very often. > The GUC added by the program looks like this: > > postgres=# show wal_update_compression_ratio ; > wal_update_compression_ratio > -- > 25 > > Is possible to add a setting here that disables the feature altogether? Yes, it can be done in below 2 ways: 1. Provide a new configuration parameter (wal_update_compression) to turn on/off this feature. 2. At table level user can be given option to configure > That always makes it easier to consider a commit, knowing people can > roll back the change if it makes performance worse. That would make > performance testing easier too. wal-update-testsuit.sh takes as long > as > 13 minutes, it's long enough that I'd like the easier to automate > comparison GUC disabling adds. If that's not practical to do given the > intrusiveness of the code, it's not really necessary. I haven't looked > at the change enough to be sure how hard this is. > > On the Mac, the only case that seems to have a slowdown now is "hundred > tiny fields, half nulled". It would be nice to understand just what is > going on with that one. I got some ugly results in "two short fields, > no change" too, along with a couple of other weird results, but I think > those were testing procedure issues that can be ignored. The pgbench > throttle work I did recently highlights that I can't really make a Mac > quiet/consistent for benchmarking very well. Note that I ran all of > the Mac tests with assertions on, to try and catch platform specific > bugs. > The Linux ones used the default build parameters. > > On Linux "hundred tiny fields, half nulled" was also by far the worst > performing one, with a >30% increase in duration despite the 14% drop > in WAL. Exactly what's going on there really needs to be investigated > before this seems safe to commit. All of the "hundred tiny fields" > cases seem pretty bad on Linux, with the rest of them running about a > 11% duration increase. The main benefit of this patch is to reduce WAL for improving time in I/O, But I think for faster I/O systems, the calculation to reduce WAL has overhead. I will check how to optimize that calculation, but I think this feature should be consider with configuration knob as it can improve many cases. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Suggestion for concurrent index creation using a single full scan operation
Hi all, I haven't given this a lot of thought, but it struck me that when rebuilding tables (be it for a restore process, or some other operational activity) - there is more often than not a need to build an index or two, sometimes many indexes, against the same relation. It strikes me that in order to build just one index, we probably need to perform a full table scan (in a lot of cases). If we are building multiple indexes sequentially against that same table, then we're probably performing multiple sequential scans in succession, once for each index. Could we architect a mechanism that allowed multiple index creation statements to execute concurrently, with all of their inputs fed directly from a single sequential scan against the full relation? >From a language construct point of view, this may not be trivial to implement for raw/interactive SQL - but possibly this is a candidate for the custom format restore? I presume this would substantially increase the memory overhead required to build those indexes, though the performance gains may be advantageous. Feel free to shoot holes through this :) Apologies in advance if this is not the correct forum for suggestions..
[HACKERS] pg_upgrade across more than two neighboring major releases
Hello all, don't know much about pg_upgrade implementation so I rather asking here before I start trying the process once again with debugging and observing deeply the code. I tried to setup our postgresql RPM to package pg_upgrade in version 8.4.13 to automatize post-rpm-upgrade update of postgresql's data to 9.2.4. I was unable to reach successful result (getting weird errors). Is pg_upgrade supposed to work even across multiple major releases? (I know that the README.rpm-dist is mentioning just one major release step, but it could be bound just to actual pg_upgrade which is packaged..) I can imagine that it may be difficult to handle everything even between two following major releases - thus if it was possible, I would consider it as heartwarming :). Thanks for your answer, Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read
On 7/23/13 2:16 AM, Satoshi Nagayasu wrote: > I've been working on new pgstattuple function to allow > block sampling [1] in order to reduce block reads while > scanning a table. A PoC patch is attached. Take a look at all of the messages linked in https://commitfest.postgresql.org/action/patch_view?id=778 Jaime and I tried to do what you're working on then, including a random block sampling mechanism modeled on the stats_target mechanism. We didn't do that as part of pgstattuple though, which was a mistake. Noah created some test cases as part of his thorough review that were not computing the correct results. Getting the results correct for all of the various types of PostgreSQL tables and indexes ended up being much harder than the sampling part. See http://www.postgresql.org/message-id/20120222052747.ge8...@tornado.leadboat.com in particular for that. > This new function, pgstattuple2(), samples only 3,000 blocks > (which accounts 24MB) from the table randomly, and estimates > several parameters of the entire table. There should be an input parameter to the function for how much sampling to do, and if it's possible to make the scale for it to look like ANALYZE that's helpful too. I have a project for this summer that includes reviving this topic and making sure it works on some real-world systems. If you want to work on this too, I can easily combine that project into what you're doing. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: template-ify (binary) extensions
On 07/16/2013 09:14 PM, I wrote: > But okay, you're saying we *have* and *want* a guarantee that even a > superuser cannot execute arbitrary native code via libpq (at least in > default installs w/o extensions). I stand corrected and have to change my position, again. For the record: We do not have such a guarantee. Nor does it seem reasonable to want one. On a default install, it's well possible for the superuser to run arbitrary code via just libpq. There are various ways to do it, but the simplest one I was shown is: - upload a DSO from the client into a large object - SELECT lo_export() that LO to a file on the server - LOAD it There are a couple other options, so even if we let LOAD perform permission checks (as I proposed before in this thread), the superuser can still fiddle with function definitions. To the point that it doesn't seem reasonable to try to protect against that. Thus, the argument against the original proposal based on security grounds is moot. Put another way: There already are a couple of "backdoors" a superuser can use. By default. Or with plpgsql removed. Thanks to Dimitri and Andres for patiently explaining and providing examples. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Performance problem in PLPgSQL
Hi, I've been trying to diagnose a severe performance regression we've been having in one of our plpgsql procedures. The example below is of course extremely simplified, and obviously not what we are really doing in the database, but it exhibits the slowdown between 9.1.9 and 9.2.4. So here is the example: create table table_test_int (col_01 int); create table table_test_numeric (col_01 numeric); CREATE OR REPLACE FUNCTION public.test_insert(nb integer) RETURNS text LANGUAGE plpgsql AS $function$ DECLARE time_start timestamp; time_stop timestamp; tmp_numeric numeric; BEGIN time_start :=clock_timestamp(); FOR i IN 1..nb LOOP INSERT INTO table_test_int(col_01) VALUES (i); END LOOP; time_stop :=clock_timestamp(); RAISE NOTICE 'time for int:%',time_stop-time_start; time_start :=clock_timestamp(); FOR i IN 1..nb LOOP INSERT INTO table_test_numeric(col_01) VALUES (i); END LOOP; time_stop :=clock_timestamp(); RAISE NOTICE 'time for numeric:%',time_stop-time_start; time_start :=clock_timestamp(); FOR i IN 1..nb LOOP INSERT INTO table_test_numeric(col_01) VALUES (i::numeric); END LOOP; time_stop :=clock_timestamp(); RAISE NOTICE 'time for numeric, casted:%',time_stop-time_start; time_start :=clock_timestamp(); FOR i IN 1..nb LOOP tmp_numeric:=cast(i as numeric); INSERT INTO table_test_numeric(col_01) VALUES (tmp_numeric); END LOOP; time_stop :=clock_timestamp(); RAISE NOTICE 'time for numeric with tmp variable:%',time_stop-time_start; RETURN 1; END; $function$ ; It just inserts nb records in a loop in 4 different maneers: - Directly in an int field - Then in a numeric field (that's where we're having problems) - Then in the same numeric field, but trying a cast (it doesn't change a thing) - Then tries with an intermediary temp variable of numeric type (which solves the problem). Here are the runtimes (tables were truncated beforehand): 9.1.9: select test_insert(100); NOTICE: time for int:00:00:09.526009 NOTICE: time for numeric:00:00:10.557126 NOTICE: time for numeric, casted:00:00:10.821369 NOTICE: time for numeric with tmp variable:00:00:10.850847 9.2.4: select test_insert(100); NOTICE: time for int:00:00:09.477044 NOTICE: time for numeric:00:00:24.757032 < NOTICE: time for numeric, casted:00:00:24.791016 < NOTICE: time for numeric with tmp variable:00:00:10.89332 I really don't know exactly where the problem comes from… but it's been hurting a function very badly (there are several of these static queries with types mismatch). And of course, the problem is not limited to numeric… text has the exact same problem. Regards, Marc -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LDAP: bugfix and deprecated OpenLDAP API
Magnus Hagander wrote: > In that case, doesn't this patch break Windows? We no longer do the > anonymous bind on Windows, since it's now in the #ifdef HAVE_LIBLDAP. > > Don't we need to keep the ldap_simple_bind() call in the Windows case, > or break it up so the call to ldap_sasl_bind_s() is moved outside the > #ifdef? At least I can't find anything in the docs that indicate that > ldap_connect() on Windows would actually call that for us - only the > other way around? This patch works for the Windows case, because ldap_connect performs an anonymous bind, see http://msdn.microsoft.com/en-us/library/windows/desktop/aa366171%28v=vs.85%29.aspx If the call to ldap_connect succeeds, the client is connected to the LDAP server as an anonymous user. The session handle should be freed with a call to ldap_unbind when it is no longer required. > I'm going to set this patch as returned with feedback for now, but > please feel free to comment on above and possibly resubmit if > necessary before the CF and I'll see if I can deal with it before the > next CF anyway, as it's a bug fix. The patch should still be good, but if we keep the deprecated OpenLDAP API, it might be more consistent to use ldap_simple_bind_s instead of ldap_sasl_bind_s. If you agree, I'll change that. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] changeset generation v5-01 - Patches & git tree
On 2013-07-22 13:50:08 -0400, Robert Haas wrote: > On Fri, Jun 14, 2013 at 6:51 PM, Andres Freund wrote: > > The git tree is at: > > git://git.postgresql.org/git/users/andresfreund/postgres.git branch > > xlog-decoding-rebasing-cf4 > > http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4 > > > > On 2013-06-15 00:48:17 +0200, Andres Freund wrote: > >> Overview of the attached patches: > >> 0001: indirect toast tuples; required but submitted independently > >> 0002: functions for testing; not required, > >> 0003: (tablespace, filenode) syscache; required > >> 0004: RelationMapFilenodeToOid: required, simple > >> 0005: pg_relation_by_filenode() function; not required but useful > >> 0006: Introduce InvalidCommandId: required, simple > >> 0007: Adjust Satisfies* interface: required, mechanical, > >> 0008: Allow walsender to attach to a database: required, needs review > >> 0009: New GetOldestXmin() parameter; required, pretty boring > >> 0010: Log xl_running_xact regularly in the bgwriter: required > >> 0011: make fsync_fname() public; required, needs to be in a different file > >> 0012: Relcache support for an Relation's primary key: required > >> 0013: Actual changeset extraction; required > >> 0014: Output plugin demo; not required (except for testing) but useful > >> 0015: Add pg_receivellog program: not required but useful > >> 0016: Add test_logical_decoding extension; not required, but contains > >> the tests for the feature. Uses 0014 > >> 0017: Snapshot building docs; not required > > I've now also committed patch #7 from this series. My earlier commit > fulfilled the needs of patches #3, #4, and #5; and somewhat longer ago > I committed #1. Thanks! > I am not entirely convinced of the necessity or > desirability of patch #6, but as of now I haven't studied the issues > closely. Fair enough. It's certainly possible to work around not having it, but it seems cleaner to introduce the notion of an invalid CommandId like we have for transaction ids et al. Allowing 2^32-2 instead of 2^32-1 subtransactions doesn't seem like a problem to me ;) > Patch #2 does not seem useful in isolation; it adds new > regression-testing stuff but doesn't use it anywhere. Yes. I found it useful to test stuff around making replication synchronous or such, but while I think we should have a facility like it in core for both, logical and physical replication, I don't think this patch is ready for prime time due to it's busy looping. I've even marked it as such above ;) My first idea to properly implement that seems to be to reuse the syncrep infrastructure but that doesn't look trivial. > I doubt that any of the remaining patches (#8-#17) can be applied > separately without understanding the shape of the whole patch set, so > I think I, or someone else, will need to set aside more time for > detailed review before proceeding further with this patch set. I > suggest that we close out the CommitFest entry for this patch set one > way or another, as there is no way we're going to get the whole thing > done under the auspices of CF1. Generally agreed. The biggest chunk of the code is in #13 anyway... Some may be applyable independently: > 0010: Log xl_running_xact regularly in the bgwriter: required Should be useful independently since it can significantly speed up startup of physical replicas. Ony many systems checkpoint_timeout will be set to an hour which can make the time till a standby gets consistent be quite high since that will be first time it sees a xl_running_xacts again. > 0011: make fsync_fname() public; required, needs to be in a different file Isn't in the shape for it atm, but could be applied as an independent infrastructure patch. And it should be easy enough to clean it up. > 0012: Relcache support for an Relation's primary key: required Might actually be a good idea independently as well. E.g. the materalized key patch could use the information that there's a candidate key around to avoid a good bit of useless work. > I'll try to find some more time to spend on this relatively soon, but > I think this is about as far as I can take this today. Was pretty helpful already, so ... ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Back branches vs. gcc 4.8.0
On 2013-04-29 23:37:43 -0400, Peter Eisentraut wrote: > On Sat, 2013-04-06 at 12:59 -0400, Tom Lane wrote: > > The reason I'm thinking it's a good idea is that it would expose any > > remaining places where we have nominally var-length arrays embedded in > > larger structs. Now that I've seen the failures with gcc 4.8.0, I'm > > quite worried that there might be some more declarations like that > > which we've not identified yet, but that by chance aren't causing > > obvious failures today. > > Here is a rough patch that replaces almost all occurrences of > something[1] in a struct with FLEXIBLE_ARRAY_MEMBER. It crashes left > and right (because of sizeof issues, probably), but at least so far the > compiler hasn't complained about any flexible-array members not at the > end of the struct, which is what it did last time. So the answer to > your concern so far is negative. I think this point in the cycle would be a good one to apply something like this. > Completing this patch will be quite a bit more debugging work. Some > kind of electric fence for palloc would be helpful. Noah's recently added valgrind mode should provide this. Do you have an updated version of this patch already? I'd be willing to make a pass over it to check whether I find any missed updates... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance_work_mem and CREATE INDEX time
On Tue, Jul 23, 2013 at 1:11 PM, Amit Langote wrote: > Hello, > > While understanding the effect of maintenance_work_mem on time taken > by CREATE INDEX, I observed that for the values of > maintenance_work_mem less than the value for which an internal sort is > performed, the time taken by CREATE INDEX increases as > maintenance_work_increases. > > My guess is that for all those values an external sort is chosen at > some point and larger the value of maintenance_work_mem, later the > switch to external sort would be made causing CREATE INDEX to take > longer. That is a smaller value of maintenance_work_mem would be > preferred for when external sort is performed anyway. Does that make > sense? > Upon further investigation, it is found that the delay to switch to external sort caused by a larger value of maintenance_work_mem is small compared to the total time of CREATE INDEX. So, plotting for a number of maintenance_work_mem values shows that its effect is negligible. Are there any other parts of external sort logic that might make it slower with increasing values of maintenance_work_mem. It seems merge order, number of tapes seem are related with state->allowedMem. Does that mean, external sort is affected by the value of maintenance_work_mem in a way roughly similar to above? -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] anchovy failing on 9.1 and earlier since using gcc 4.8
On 2013-07-23 09:11:00 +0200, Andres Freund wrote: > Hi, > > Anchovy is failing on 9.1 and earlier for quite some time now: > http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=anchovy&br=REL9_1_STABLE > The commit at that date looks rather unconspicuous. Looking at the > 'config' section reveals the difference between the failing > http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=anchovy&dt=2013-03-31%2023%3A23%3A01 > and > http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=anchovy&dt=2013-03-30%2000%3A23%3A01 > > 2013-03-30: config: > > configure:3244: checking for C compiler version > configure:3252: gcc --version >&5 > gcc (GCC) 4.7.2 > > 2013-03-31: config: > > configure:3244: checking for C compiler version > configure:3252: gcc --version >&5 > gcc (GCC) 4.8.0 > > (note that it's using 4.8.1 these days) > > So, it started failing the day it switched to gcc 4.8. > > Given that anchovy builds 9.2 and master happily and given that it is > the only gcc-4.8 animal in the buildfarm it seems quite possible that > there is some misoptimization issue in 9.1 and earlier with 4.8. > > I seem to recall some discussions about whether to backpatch some > hardening against compiler optimizations, but I don't remember any > specifics. Ah, yes. Exactly this case has been discussed some weeks ago: http://archives.postgresql.org/message-id/14242.1365200084%40sss.pgh.pa.us I'd personally vote for doing both (1) and (2) mentioned in that email in the backbranches. I don't think -fno-agressive-loop-optimizations is the only case the compiler can actually use such knowledge. It very well might optimize entire code blocks away. And I don't think this change is the only one where the aggressive loop optimizations might play a role, especially in the older branches. I naturally only found the mailing list entry after debugging the issue myself... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using ini file to setup replication
On Mon, Jul 22, 2013 at 10:59 PM, Greg Stark wrote: > On Fri, Jul 19, 2013 at 2:20 PM, Samrat Revagade > wrote: > >> for example: >> if i want to configure 2 servers then it will add 6 lines,for 3 -9, for 4-12 >> setting's for particular server will be: >> >> considering the way of setting value to conf parameters : guc . value >> >> standby_name.'AAA' >> synchronous_transfer. commit >> wal_sender_timeout.60 > > > I have a feeling Samrat and Sawada-san have some good use cases where > this extra syntax could be a big step up in expressiveness. But I'm > having a hard time figuring out exactly what they have in mind. If > either of you could explain in more detail how the extra syntax would > apply to your use case and how it would let you do something that you > can't already do it might be helpful. > > I'm assuming the idea is something like having a single config file > which can work for the server regardless of whether it's acting as the > primary or standby and then be able to switch roles by switching a > single flag which would control which parts of the config file were > applied. But I'm having a hard time seeing how exactly that would > work. in this approach which GUC parameters is written in postgresql.conf, user would write many extra line in postgresql.conf by a standby server as Samrat suggested. It will increase size of postgresql.conf. I think it is not good that all those parameters are written in postgresql.conf. that is, I think that those parameters should be written in separately file. e.g., we can set separately any parameter using "include" (or "include if exist") in postgresql.conf. if include file doesn't exist, we would set default value to each wal sender. that is, we give up ini file, and we provide mechanism of setting to each wal sender as option of overwrite. of course to support this approach, it needs to use the patch which Andres suggested, and server should be able to handle toke which is two or mote separated by a dot. so we would like to help this approach. Regards, Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSOC13 proposal - extend RETURNING syntax
W dniu 23.07.2013 06:22, David Fetter pisze: > What problem or problems did you notice, and what did you change to > fix them? "UPDATE ... FROM" generated "ERROR: variable not found in subplan target lists". I've added some workaround in add_vars_to_targetlist: - if it is an "after" - simple change var->varno to base RTE (it should always exists, the value is temporary, it will change to OUTER_VAR) - if it is a "before" - add to targetlist new var independently from rel->attr_needed[attno]. Additionally I've change build_joinrel_tlist to ignore any "BEFORE" RTEs. The regression tests are updated. Regards, Karol -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] anchovy failing on 9.1 and earlier since using gcc 4.8
Hi, Anchovy is failing on 9.1 and earlier for quite some time now: http://pgbuildfarm.org/cgi-bin/show_history.pl?nm=anchovy&br=REL9_1_STABLE The commit at that date looks rather unconspicuous. Looking at the 'config' section reveals the difference between the failing http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=anchovy&dt=2013-03-31%2023%3A23%3A01 and http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=anchovy&dt=2013-03-30%2000%3A23%3A01 2013-03-30: config: configure:3244: checking for C compiler version configure:3252: gcc --version >&5 gcc (GCC) 4.7.2 2013-03-31: config: configure:3244: checking for C compiler version configure:3252: gcc --version >&5 gcc (GCC) 4.8.0 (note that it's using 4.8.1 these days) So, it started failing the day it switched to gcc 4.8. Given that anchovy builds 9.2 and master happily and given that it is the only gcc-4.8 animal in the buildfarm it seems quite possible that there is some misoptimization issue in 9.1 and earlier with 4.8. I seem to recall some discussions about whether to backpatch some hardening against compiler optimizations, but I don't remember any specifics. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Design proposal: fsync absorb linear slider
On Mon, Jul 22, 2013 at 8:48 PM, Greg Smith wrote: > And I can't get too excited about making this as my volunteer effort when I > consider what the resulting credit will look like. Coding is by far the > smallest part of work like this, first behind coming up with the design in > the first place. And both of those are way, way behind how long review > benchmarking takes on something like this. The way credit is distributed > for this sort of feature puts coding first, design not credited at all, and > maybe you'll see some small review credit for benchmarks. That's completely > backwards from the actual work ratio. If all I'm getting out of something > is credit, I'd at least like it to be an appropriate amount of it. FWIW, I think that's a reasonable request. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers