Re: [HACKERS] Keywords in pg_hba.conf should be field-specific
Any progress on this? --- Brendan Jurd wrote: > On 18 October 2010 01:19, Tom Lane wrote: > > Brendan Jurd writes: > >> On 17 October 2010 09:59, Tom Lane wrote: > >>> Good point. ?Maybe the correct fix is to remember whether each token was > >>> quoted or not, so that keyword detection can be done safely after the > >>> initial lexing. ?I still think that the current method is impossibly > >>> ugly ... > > > >> I'm happy to revise the patch on that basis. ?Any suggestions about > >> how to communicate the 'quotedness' of each token? ?We could make each > >> token a struct consisting of the token itself, plus a boolean flag to > >> indicate whether it had been quoted. ?Does that work for you? > > > > Seems reasonable. ?I had the idea of a parallel list of booleans in the > > back of my mind, but a list of structs is probably easier to understand, > > and to extend further if necessary. > > > > Okay, I've taken the red pill and I'm finding out how deep the rabbit > hole goes ... > > The logical structure of pg_hba.conf is a set of lines, each line > containing a set of fields, each field containing a set of tokens. > The way the existing implementation handles this is to create a list > of lines containing sublists of fields, containing comma-separated > strings for the set of tokens, with newlines embedded next to tokens > which might be keywords. > > The tokeniser breaks apart the comma-separated tokens ... and then > reassembles them into a comma-separated string. Which the db/role > matching functions then have to break apart *again*. > > In order to keep track of whether each individual token was quoted, I > first need to impose some sanity here. Rather than using a magical > string for each field, I intend to use a List of HbaToken structs > which explicitly note whether quoting was used. > > Introducing an extra List level does mean a bit more work copying and > freeing, and it makes the patch really quite intrusive. I have to > touch a lot of lines in hba.c, but I think the additional clarity is > worth it. If nobody dissuades me from this approach I hope to post a > patch in a couple of days. > > Cheers, > BJ > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: cross column correlation ...
Robert Haas wrote: > > Actually, we *do* have some idea which tables are hot. ?Or at least, we > > could. ? Currently, pg_stats for tables are "timeless"; they just > > accumulate from the last reset, which has always been a problem in > > general for monitoring. ?If we could make top-level table and index > > stats time-based, even in some crude way, we would know which tables > > were currently hot. ?That would also have the benefit of making server > > performance analysis and autotuning easier. > > I think there would be value in giving the DBA an easier way to see > which tables are hot, but I am really leery about the idea of trying > to feed that directly into the query planner. I think this is one of > those cases where we let people tune it manually for starters, and > then wait for feedback. Eventually someone will say "oh, I never tune > that by hand any more, ever since I wrote this script which does the > following computation... and I just run it out cron". And then we > will get out the party hats. But we will never get the experience we > need to say what that auto-tuning algorithm will be unless we first > provide the knob for someone to fiddle with manually. It is also possible we will implement a manual way and never get around to automating it. :-( -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO: You can alter it, but you can't view it
Josh Berkus wrote: > > > Right now pg_options_to_table() is not documented. Should it be? > > Yes, I think so. Done, with the attached, applied patch. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 736eb67..c620142 100644 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** SELECT pg_type_is_visible('myschema.widg *** 13244,13249 --- 13244,13253 + pg_options_to_table + + + pg_tablespace_databases *** SELECT pg_type_is_visible('myschema.widg *** 13380,13385 --- 13384,13394 get underlying SELECT command for view +pg_options_to_table(reloptions) +name, option +get the set of option name/value pairs from pg_class.reloptions + + pg_tablespace_databases(tablespace_oid) setof oid get the set of database OIDs that have objects in the tablespace *** SELECT pg_type_is_visible('myschema.widg *** 13475,13480 --- 13484,13495 +pg_options_to_table returns the set of option +name/value pairs when passed +pg_class.reloptions. + + + pg_tablespace_databases allows a tablespace to be examined. It returns the set of OIDs of databases that have objects stored in the tablespace. If this function returns any rows, the tablespace is not -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
Marko Tiikkaja writes: > On 2011-02-25 4:58 PM, Tom Lane wrote: >> Specifically, I'm imagining getting rid of the patch's additions to >> InitPlan and ExecutePlan that find all the modifying sub-queries and >> force them to be cycled to completion before the main plan runs. >> Just run the main plan and let it pull tuples from the CTEs as needed. >> Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to >> completion before shutting down the plan. > This idea has actually been discussed before when we talked about > optimizing wCTEs, but IIRC you said that doing this in ExecutorEnd is a > bit ugly. Further experimentation has reminded me of why I didn't want to put such processing in ExecutorEnd :-(. There are some nasty interactions with EXPLAIN: 1. EXPLAIN ANALYZE fails to include the execution cycles associated with running the ModifyTable nodes to completion. In the worst case, such as "WITH t AS (INSERT ...) SELECT 1", it will claim the INSERT subplan is never executed, even though rows certainly got inserted. This is because EXPLAIN extracts all the counts from the execution state tree before shutting it down with ExecutorEnd. 2. But it gets worse. Try the same query *without* ANALYZE. You'll find the INSERT executes anyway! That's because EXPLAIN still calls ExecutorEnd to clean up the execution state tree, and ExecutorEnd doesn't realize it's not supposed to run any of the plan. So we really need some refactoring here. I dislike adding another fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence, but there may not be a better way. The only way I see to fix this without changing that API is to have ExecutorRun do the cleanup processing just after the top plan node returns a null tuple, and that seems a bit ugly as well. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: about the name of the feature
On Sat, Feb 26, 2011 at 12:30 AM, Tom Lane wrote: > Robert Haas writes: >> Yay! I'm excited about this, particularly the possible "pipelining" >> stuff, where you can do WITH (DELETE .. RETURNING ..) INSERT ... and >> have it be like cool and fast and stuff. > >> Or at least I hope you can do that. > > It's gonna need some work yet. As things stand, the tuples are indeed > pipelined through, but the CteScan nodes *also* stash them aside into > tuplestores, just in case somebody demands a rescan. Fixing that will > require revisiting the exec flags (EXEC_FLAG_REWIND etc). We don't > currently distinguish "it's unlikely you'll have to rescan" from > "you're guaranteed not to have to rescan", but a CteScan that's covering > a ModifyTable has to know the latter to not have to keep hold of copies > of the RETURNING tuples. > > It might be a small enough change to do after alpha starts, but I don't > have time for it right now. Well, if nothing else, the potential is there for a future release. I'm probably not quite as excited about this feature as David Fetter (and my 100-Watt lightbulb is not quite as bright as the sun at high noon in midsummer) but I do think it's pretty cool, and I appreciate you getting it in, even in a somewhat basic form. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: about the name of the feature
Robert Haas writes: > Yay! I'm excited about this, particularly the possible "pipelining" > stuff, where you can do WITH (DELETE .. RETURNING ..) INSERT ... and > have it be like cool and fast and stuff. > Or at least I hope you can do that. It's gonna need some work yet. As things stand, the tuples are indeed pipelined through, but the CteScan nodes *also* stash them aside into tuplestores, just in case somebody demands a rescan. Fixing that will require revisiting the exec flags (EXEC_FLAG_REWIND etc). We don't currently distinguish "it's unlikely you'll have to rescan" from "you're guaranteed not to have to rescan", but a CteScan that's covering a ModifyTable has to know the latter to not have to keep hold of copies of the RETURNING tuples. It might be a small enough change to do after alpha starts, but I don't have time for it right now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: cross column correlation ...
On Fri, Feb 25, 2011 at 6:41 PM, Josh Berkus wrote: >> One idea Tom and I kicked around previously is to set an assumed >> caching percentage for each table based on its size relative to >> effective_cache_size - in other words, assume that the smaller a table >> is, the more of it will be cached. Consider a system with 8GB of RAM, >> and a table which is 64kB. It is probably unwise to make any plan >> based on the assumption that that table is less than fully cached. If >> it isn't before the query executes, it soon will be. Going to any >> amount of work elsewhere in the plan to avoid the work of reading that >> table in from disk is probably a dumb idea. Of course, one downside >> of this approach is that it doesn't know which tables are hot and >> which tables are cold, but it would probably still be an improvement >> over the status quo. > > Actually, we *do* have some idea which tables are hot. Or at least, we > could. Currently, pg_stats for tables are "timeless"; they just > accumulate from the last reset, which has always been a problem in > general for monitoring. If we could make top-level table and index > stats time-based, even in some crude way, we would know which tables > were currently hot. That would also have the benefit of making server > performance analysis and autotuning easier. I think there would be value in giving the DBA an easier way to see which tables are hot, but I am really leery about the idea of trying to feed that directly into the query planner. I think this is one of those cases where we let people tune it manually for starters, and then wait for feedback. Eventually someone will say "oh, I never tune that by hand any more, ever since I wrote this script which does the following computation... and I just run it out cron". And then we will get out the party hats. But we will never get the experience we need to say what that auto-tuning algorithm will be unless we first provide the knob for someone to fiddle with manually. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: cross column correlation ...
On Fri, Feb 25, 2011 at 2:26 PM, Alvaro Herrera wrote: > Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: > >> How practical would it be for analyze to keep a record of response times for >> given sections of a table as it randomly accesses them and generate some >> kind of a map for expected response times for the pieces of data it is >> analysing? > > I think what you want is random_page_cost that can be tailored per > tablespace. We have that. But it's not the same as tracking *sections of a table*. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: about the name of the feature
On Fri, Feb 25, 2011 at 7:00 PM, Tom Lane wrote: > Marko Tiikkaja writes: >> On 2011-02-24 6:37 PM +0200, Tom Lane wrote: >>> OK, I will make those adjustments. Are you going to do more work on the >>> documentation part of the patch? I can stick to working on the code >>> part meanwhile, if you are. > >> I am planning on working on the documentation this weekend. > > I've gone ahead and applied the code portion of the patch, Yay! I'm excited about this, particularly the possible "pipelining" stuff, where you can do WITH (DELETE .. RETURNING ..) INSERT ... and have it be like cool and fast and stuff. Or at least I hope you can do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python do not delete function arguments
On Tue, Feb 15, 2011 at 6:04 PM, Jan Urbański wrote: > On 15/02/11 20:39, Peter Eisentraut wrote: >> On tis, 2011-02-15 at 09:58 +0100, Jan Urbański wrote: >>> Because the invocation that actually recurses sets up the scene for >>> failure. >> >> That's what we're observing, but I can't figure out why it is. If you >> can, could you explain it? >> >> It actually makes sense to me that the arguments should be deleted at >> the end of the call. The data belongs to that call only, and >> PLy_procedure_delete() that would otherwise clean it up is only called >> rarely. >> >> Apparently, the recursive call ends up deleting the wrong arguments, but >> it's not clear to me why that would affect the next top-level call, >> because that would set up its own arguments again anyway. In any case, >> perhaps the right fix is to fix PLy_function_delete_args() to delete the >> args correctly. > > Aaah, ok, I got it (again). Let me write this in full before I forget > and spend another hour chasing that bug (and boy, bugs that disappear > because you're doing things in the debugger are so annoying). And > actually, my patch doesn't fix it fully :| Let me demonstrate: > > CREATE FUNCTION rec(n integer) RETURNS integer AS $$ > if n == 0: > return > plpy.notice("before n is %d" % n) > plpy.execute("select rec(0)") > plpy.notice("after n is %d" % n) > $$ LANGUAGE plpythonu; > > Without the patch the second plpy.notice raises a NameError. With the > patch the output is: > > NOTICE: before n is 4 > CONTEXT: PL/Python function "rec" > NOTICE: after n is 0 > CONTEXT: PL/Python function "rec" > > What happens? In PLy_function_handler, PLy_function_build_args is > called, and proc->globals is set. After that PLy_procedure_call is > called, which starts executing Python code. The Python code does a call > into C with plpy.execute, and PLy_function_handler gets called (a > reentrant call). > > Then PLy_function_build_args is called again. It overwrites the "n" > entry in proc->globals and then PLy_procedure_call gets called, which > drops us back into Python (on the stack there's now C, Python, C, > Python). This second invocation exits quickly because n == 0, and we're > back in C. > > Now without my patch, the next thing to happen was deleting the > arguments, which removed "n" from the proc->globals dict. The rest of C > code runs and finally plpy.execute returns and we;re back in Python (the > stack is C, Python). > > The second plpy.notice is run, which fetches "n" from the globals, and > not finding it, raises a NameError. With the patch it simply fetches the > overwritten value, namely 0. > > The KeyError was a red herring - that's how Python reacted when > evaluating "n in (0, 1)", and if you look in the server log you'll see a > RuntimeWarning complaining about something internal, that doesn't > matter. The bottom line is that PLy_procedure_call is not reentrant > because of proc->globals, and it has to be. > > Now when fixing this bug I tries copying the globals dict and restoring > it, but ran into issues (I think the problem was that the function > didn't like running with different globals then the one it has been > compiled with). Not sure what to do with this :( Document it as a caveat > (with or without my patch) and carry on? That sucks quite badly... From this discussion I gather that we have a problem here that we don't exactly know how to fix, so I'm inclined to suggest that we mark this Returned with Feedback in the CommitFest and instead add it to the TODO. Since this is a pre-existing bug and not a new regression, it should not be something we hold up beta for. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Fix snapshot taking inconsistencies
Robert Haas writes: > Tom/Alvaro, have the two of you hammered out who is going to finish > this one off? I *believe* Alvaro told me on IM that he was leaving > this one for Tom. Last I heard, the ball was in my court. I'll try to get it done over the weekend. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python explicit subtransactions
On Wed, Feb 9, 2011 at 5:22 PM, Peter Eisentraut wrote: > On tis, 2011-02-08 at 00:32 -0500, Steve Singer wrote: >> On 11-02-06 11:40 AM, Jan Urbański wrote: >> >> > PFA an updated patch with documentation. >> >> > Yeah, changed them. >> >> Those changes look fine. The tests now pass. >> >> I've attached a new version of the patch that fixes a few typos/wording >> issues I saw in the documentation. I also changed the link to the >> python reference manual section on context managers. I think it is >> better to link to that versus the original PEP. >> >> The documentation could probably still use more word-smithing but that >> can happen later. I'm marking this as ready for a committer. > > Is it necessarily a good idea that an explicit subtransaction disables > the implicit sub-subtransactions? It might be conceivable that you'd > still want to do some try/catch within explicit subtransactions. Is this still an open question, or what is the remaining issue that needs to be addressed with regards to this patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pl/python tracebacks
On Thu, Feb 24, 2011 at 9:03 AM, Jan Urbański wrote: > On 24/02/11 14:10, Peter Eisentraut wrote: >> On tor, 2010-12-23 at 14:56 +0100, Jan Urbański wrote: >>> For errors originating from Python exceptions add the traceback as the >>> message detail. The patch tries to mimick Python's traceback.py module >>> behaviour as close as possible, icluding interleaving stack frames >>> with source code lines in the detail message. Any Python developer >>> should instantly recognize these kind of error reporting, it looks >>> almost the same as an error in the interactive Python shell. >> >> I think the traceback should go into the CONTEXT part of the error. The >> context message that's already there is now redundant with the >> traceback. >> >> You could even call errcontext() multiple times to build up the >> traceback, but maybe that's not necessary. > > Hm, perhaps, I put it in the details, because it sounded like the place > to put information that is not that important, but still helpful. It's > kind of natural to think of the traceback as the detail of the error > message. But if you prefer context, I'm fine with that. You want me to > update the patch to put the traceback in the context? I don't see a response to this question from Peter, but I read his email to indicate that he was hoping you'd rework along these lines. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Fix snapshot taking inconsistencies
On Thu, Feb 24, 2011 at 11:02 AM, Tom Lane wrote: > Marko Tiikkaja writes: >> On 2011-02-24 5:21 PM, Tom Lane wrote: >>> Oh, did we decide to do it that way? OK with me, but the submitted docs >>> are woefully inadequate on the point. This behavior is going to have to >>> be explained extremely clearly (and even so, I bet we'll get bug reports >>> about it :-(). > >> I'm ready to put more effort into the documentation if the patch is >> going in, but I really don't want to waste my time just to hear that the >> patch is not going to be in 9.1. Does this sound acceptable? > > I've found some things I don't like about it, but the only part that > seems far short of being committable is the documentation. Tom/Alvaro, have the two of you hammered out who is going to finish this one off? I *believe* Alvaro told me on IM that he was leaving this one for Tom. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL FDW update
On Thu, Feb 24, 2011 at 9:06 AM, Robert Haas wrote: > On Thu, Feb 24, 2011 at 8:13 AM, Shigeru HANADA > wrote: >> >> On Wed, 23 Feb 2011 20:30:05 +0900 >> Shigeru HANADA wrote: >> >>> >>> On Tue, 22 Feb 2011 11:33:25 -0500 >>> Robert Haas wrote: >>> > Is anyone actually working on a new version of this patch sufficiently >>> > rapidly that we can expect a new version in the next day or two? >>> > >>> > If not, I think we mark this one Returned with Feedback and revisit it >>> > for 9.2. >>> >>> I'm working on it. >>> >>> Fixes for new FDW API have been done, but there are some problems in >>> SQL generation codes, such as SELECT clause optimization (omitting >>> unused column from SELECT clause). It would take a while, but I'll >>> post revised version of the patch tomorrow. >> >> Attached is a revised version of postgresql_fdw patch. I started from >> Heikki's latest patch, and modified some points: >> >> 1) use new FDW API >> 2) use EXTENSION framework >> 3) SELECT clause optimization (use NULL for unused columns) >> 4) show remote query in EXPLAIN output >> >> WHERE clause pushdown was implemented in Heikki's version, so I didn't >> touch around it. Now I'm working on cost estimation and connection >> management, but they would need some more work. > > So this is still work-in-progress? When do you expect a final version? Since it sounds like this still needs more work, I'm going to mark it Returned with Feedback. I hope we integrate this during the 9.2 cycle. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
I wrote: > The first solution that comes to mind is to pay attention to the > interdependencies of the CTEs, and perform the cleanup in an appropriate > order (here, the ModifyTable for y needs to be cycled first). Doh ... actually, we already *are* ordering the CTEs in dependency order, so it's a one-liner fix to do the shutdowns in reverse order. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
David Fetter writes: > Sorry that was unclear. Let's imagine there's a DELETE ... RETURNING > in one WITH, and an UPDATE in another that depends on that one. Is > that still allowed? Yeah it is, although I just noticed that there's a bug in the new implementation: with t1 as (insert into x select ... returning *), t2 as (insert into y select * from t1 returning *) select 1; This should result in the same rows inserted into both x and y, but in git HEAD it fails to insert anything into y. The reason is that the ExecutorEnd scan first processes the ModifyTable node for x, and cycles it to completion, discarding the results --- but we needed the CteScan in t2 to see those rows. There's a related case in the regression tests, but it works because the outer query does fetch from both WITH clauses, so there's no need to do anything at ExecutorEnd time. The first solution that comes to mind is to pay attention to the interdependencies of the CTEs, and perform the cleanup in an appropriate order (here, the ModifyTable for y needs to be cycled first). I'm not sure if there's a nicer way. We'll eventually want some interdependency tracking for CTEs anyway, if we're ever to support mutual recursion, so it'd not be completely single-purpose code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
On Fri, Feb 25, 2011 at 10:12:02PM -0500, Tom Lane wrote: > David Fetter writes: > > What's the effect, if any, on CTEs that depend on each other > > explicitly? > > An error. That would require mutual recursion, which we don't > support for the SELECT case let alone data-modifying statements. Sorry that was unclear. Let's imagine there's a DELETE ... RETURNING in one WITH, and an UPDATE in another that depends on that one. Is that still allowed? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
David Fetter writes: > What's the effect, if any, on CTEs that depend on each other > explicitly? An error. That would require mutual recursion, which we don't support for the SELECT case let alone data-modifying statements. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: about the name of the feature
On Thu, Feb 24, 2011 at 11:35:30AM -0800, David Wheeler wrote: > On Feb 24, 2011, at 10:43 AM, Robert Haas wrote: > > >> The best idea I have at the moment is to spell out "data modifying > >> command" (or "statement") rather than relying on the acronym. > >> In the code, we could change hasDmlWith to hasModifyingWith, for > >> example. The error messages could read like > >>data-modifying statement in WITH is not allowed in a view > >> > >> Comments? > > > > Great idea. I had the same complaint when I looked at this patch > > a year ago, but didn't come up with nearly as good an idea as to > > what to do about it. > > I like "statement" better than "command," too, but love the acronym > DMC. As in, "you want to Run [a] DMC." ;-P Hit it, Run! ;) Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: about the name of the feature
On Thu, Feb 24, 2011 at 11:20:48AM -0500, Tom Lane wrote: > The wCTE patch refers to the feature it's adding as "DML WITH". I'm > still pretty unhappy with that terminology. In my view of the world, > "DML" includes SELECT as well as INSERT/UPDATE/DELETE. The wikipedia > entry about the term > http://en.wikipedia.org/wiki/Data_Manipulation_Language > agrees that that's at least the majority usage, and even our own docs > seem to use it to include SELECT as often as not. Since the distinction > is absolutely critical to talking about this feature sensibly, I don't > think it's a good plan to use an acronym that is guaranteed to produce > uncertainty in the reader's mind. > > The best idea I have at the moment is to spell out "data modifying > command" (or "statement") rather than relying on the acronym. > In the code, we could change hasDmlWith to hasModifyingWith, for > example. The error messages could read like > data-modifying statement in WITH is not allowed in a view > > Comments? +1 If we ever decide add in what I'd originally envisioned, namely DCL and DDL, the name continues to describe what's going on :) Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
On Fri, Feb 25, 2011 at 09:58:36AM -0500, Tom Lane wrote: > I had what seems to me a remarkably good idea, though maybe someone else > can spot a problem with it. Given that we've decided to run the > modifying sub-queries all with the same command counter ID, they are > logically executing "in parallel". The current implementation takes no > advantage of that fact, though: it's based around the idea of running > the updates strictly sequentially. I think we should change it so that > the updates happen physically, not only logically, concurrently. > Specifically, I'm imagining getting rid of the patch's additions to > InitPlan and ExecutePlan that find all the modifying sub-queries and > force them to be cycled to completion before the main plan runs. > Just run the main plan and let it pull tuples from the CTEs as needed. > Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to > completion before shutting down the plan. (In the event of an error, > we'd never get to ExecutorEnd, but it doesn't matter since whatever > updates we did apply are nullified anyhow.) What's the effect, if any, on CTEs that depend on each other explicitly? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: about the name of the feature
Marko Tiikkaja writes: > On 2011-02-26 2:00 AM, Tom Lane wrote: >> I've gone ahead and applied the code portion of the patch, with >> modifications as per discussion, and other editorialization. > Thanks a lot! > One thing bothers me though: what was the reason for requiring a > RETURNING clause for data-modifying statements in WITH? That test was in your patch, no? I moved the code to another place but it's still enforcing the same thing, namely that you can't reference the output of an INSERT/UPDATE/DELETE that hasn't got RETURNING. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: about the name of the feature
On 2011-02-26 2:00 AM, Tom Lane wrote: I've gone ahead and applied the code portion of the patch, with modifications as per discussion, and other editorialization. Thanks a lot! One thing bothers me though: what was the reason for requiring a RETURNING clause for data-modifying statements in WITH? I'll wait on you to produce documentation updates before dealing with the docs, but I figured we might as well get some buildfarm cycles on it meanwhile. Thanks, I'll send an improved version tomorrow. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disposition of remaining patches
On Fri, Feb 25, 2011 at 5:21 PM, Josh Berkus wrote: > On 2/25/11 4:57 PM, Jeff Davis wrote: >> On Fri, 2011-02-25 at 15:44 -0800, Josh Berkus wrote: >>> Hmmm, I don't follow this. The user can only disable syncrep for their >>> own transactions. If they don't care about the persistence of their >>> transaction post-failover, why should the DBA care? >> >> I think that's the difference between failover and switchover, right? At >> least Slony makes such a distinction, as well. > > Yeah. Actually, what would be even simpler and more to the point would > be a command that says "flush all transactions from Server A to Server > B, then fail over". That would be nice; I'm basically abusing syncrep to this purpose. At the same time, someone may need to be notified of such a switchover occurring, and in event of failure, it'd be nice to bounce back to the primary. Tangentially relevent, Virtual IP is not always an option, such as on Amazon EC2. But I digress. Such a command is unlikely to make it into 9.1; maybe we can circle around on that in 9.2. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
On Fri, Feb 25, 2011 at 4:52 PM, Jeff Davis wrote: > On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote: >> Oh, yes, this reproduces past shutdowns/startups, and there's quite a >> few txids before I catch up. I'm also comfortable poking around with >> gdb (I have already recompiled with debugging symbols and >> optimizations off and was poking around, especially at >> MemoryContextStats(TopMemoryContext), but was not rewarded. > > Where is all of that memory going during recovery? Recovery shouldn't > use much memory at all, as far as I can tell. > > What's even allocating memory at all? I noticed this is RSS fooling with me. As pages get touched in shared memory, for some reason RSS was constantly getting increased, along with SHR at the same time. Still, the long recovery time was mystifying to me, considering the lack of unclean shutdowns. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disposition of remaining patches
On 2/25/11 4:57 PM, Jeff Davis wrote: > On Fri, 2011-02-25 at 15:44 -0800, Josh Berkus wrote: >> Hmmm, I don't follow this. The user can only disable syncrep for their >> own transactions. If they don't care about the persistence of their >> transaction post-failover, why should the DBA care? > > I think that's the difference between failover and switchover, right? At > least Slony makes such a distinction, as well. Yeah. Actually, what would be even simpler and more to the point would be a command that says "flush all transactions from Server A to Server B, then fail over". -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote: > Oh, yes, this reproduces past shutdowns/startups, and there's quite a > few txids before I catch up. I'm also comfortable poking around with > gdb (I have already recompiled with debugging symbols and > optimizations off and was poking around, especially at > MemoryContextStats(TopMemoryContext), but was not rewarded. Where is all of that memory going during recovery? Recovery shouldn't use much memory at all, as far as I can tell. What's even allocating memory at all? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disposition of remaining patches
On Fri, 2011-02-25 at 15:44 -0800, Josh Berkus wrote: > Hmmm, I don't follow this. The user can only disable syncrep for their > own transactions. If they don't care about the persistence of their > transaction post-failover, why should the DBA care? I think that's the difference between failover and switchover, right? At least Slony makes such a distinction, as well. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disposition of remaining patches
On Fri, Feb 25, 2011 at 4:36 PM, Josh Berkus wrote: > Daniel, > >> Ah, okay, I had missed that discussion, I also did not know it got so >> specific as to address this case (are you sure?) rather than something >> more general, say quorum or N-safe durability. > > The way we address that case is through n-safe durability. How is this exposed? The simple "count the number of fsyncs()" approach is not quite good enough (one has no control to make sure one or more nodes are definitely up-to-date) unless one wants to just make it go to *all* syncrep standys for a while. That seems like overkill; so I imagine something else is in the thoughts. I'll search the archives... >> The user may have their own level of durability guarantee they want to >> attain (that's why machine "B" is syncrepped in my example), but when >> doing the switchover I think an override to enable a smooth handoff >> (meaning: everything syncrepped) would be best. What I want to avoid >> is an ack from "COMMIT" from the primary (machine "A"), and then, post >> switchover, the data isn't there on machine A-Prime (or "B", provided >> it was able to follow successfully at all, as in the current case it >> might get ahead of A-prime in the WAL). > > Yeah, when I think about your use case, I can understand why it's an > issue. It would be nice to have a superuser setting (or similar) which > could override user preferances and make all transactions synchrep > temporarily. I'm not sure that's going to be reasonable to do for 9.1 > though. Agreed; I'd be happy to take any syncrep functionality, although it wouldn't compose well as-is, I wanted to raise this so that we didn't make any configuration decisions that got in the way of making composition possible later. Again, I haven't thought ahead yet, partially because I thought there may be some existing thoughts in play to consider. With that, I will try to give syncrep a more structured review Real Soon, although the late date of this is leaving me queasy as to the odds of git-commit. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disposition of remaining patches
Daniel, > Ah, okay, I had missed that discussion, I also did not know it got so > specific as to address this case (are you sure?) rather than something > more general, say quorum or N-safe durability. The way we address that case is through n-safe durability. > The user may have their own level of durability guarantee they want to > attain (that's why machine "B" is syncrepped in my example), but when > doing the switchover I think an override to enable a smooth handoff > (meaning: everything syncrepped) would be best. What I want to avoid > is an ack from "COMMIT" from the primary (machine "A"), and then, post > switchover, the data isn't there on machine A-Prime (or "B", provided > it was able to follow successfully at all, as in the current case it > might get ahead of A-prime in the WAL). Yeah, when I think about your use case, I can understand why it's an issue. It would be nice to have a superuser setting (or similar) which could override user preferances and make all transactions synchrep temporarily. I'm not sure that's going to be reasonable to do for 9.1 though. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disposition of remaining patches
On Fri, Feb 25, 2011 at 3:44 PM, Josh Berkus wrote: > >> Right now, as it stands, the syncrep patch will be happy as soon as >> the data has been fsynced to either B or A-prime; I don't think we can >> guarantee at any point that A-prime can become the leader, and feed B. > > Yeah, I think that's something we said months ago is going to be a 9.2 > feature, no sooner. Ah, okay, I had missed that discussion, I also did not know it got so specific as to address this case (are you sure?) rather than something more general, say quorum or N-safe durability. >> 2. The unprivileged user can disable syncrep, in any situation. This >> flexibility is *great*, but you don't really want people to do it when >> one is performing the switchover. Rather, in a magical world we'd hope >> that disabling syncrep would just result in not having to >> synchronously commit to B (but, in this case, still synchronously >> commit to A-prime) >> >> In other words, to my mind, you can use syncrep as-is to provide >> 2-safe durability xor a scheduled switchover: as soon as someone wants >> both, I think they'll have some trouble. I do want both, though. > > Hmmm, I don't follow this. The user can only disable syncrep for their > own transactions. If they don't care about the persistence of their > transaction post-failover, why should the DBA care? The user may have their own level of durability guarantee they want to attain (that's why machine "B" is syncrepped in my example), but when doing the switchover I think an override to enable a smooth handoff (meaning: everything syncrepped) would be best. What I want to avoid is an ack from "COMMIT" from the primary (machine "A"), and then, post switchover, the data isn't there on machine A-Prime (or "B", provided it was able to follow successfully at all, as in the current case it might get ahead of A-prime in the WAL). -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: about the name of the feature
Marko Tiikkaja writes: > On 2011-02-24 6:37 PM +0200, Tom Lane wrote: >> OK, I will make those adjustments. Are you going to do more work on the >> documentation part of the patch? I can stick to working on the code >> part meanwhile, if you are. > I am planning on working on the documentation this weekend. I've gone ahead and applied the code portion of the patch, with modifications as per discussion, and other editorialization. I'll wait on you to produce documentation updates before dealing with the docs, but I figured we might as well get some buildfarm cycles on it meanwhile. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disposition of remaining patches
> Right now, as it stands, the syncrep patch will be happy as soon as > the data has been fsynced to either B or A-prime; I don't think we can > guarantee at any point that A-prime can become the leader, and feed B. Yeah, I think that's something we said months ago is going to be a 9.2 feature, no sooner. > 2. The unprivileged user can disable syncrep, in any situation. This > flexibility is *great*, but you don't really want people to do it when > one is performing the switchover. Rather, in a magical world we'd hope > that disabling syncrep would just result in not having to > synchronously commit to B (but, in this case, still synchronously > commit to A-prime) > > In other words, to my mind, you can use syncrep as-is to provide > 2-safe durability xor a scheduled switchover: as soon as someone wants > both, I think they'll have some trouble. I do want both, though. Hmmm, I don't follow this. The user can only disable syncrep for their own transactions. If they don't care about the persistence of their transaction post-failover, why should the DBA care? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: cross column correlation ...
> 4. Even if we could accurately estimate the percentage of the table > that is cached, what then? For example, suppose that a user issues a > query which retrieves 1% of a table, and we know that 1% of that table > is cached. How much of the data that the user asked for is cache? FWIW, for a manual override setting, I was thinking that the % would convert to a probability. In that way, it wouldn't be different from the existing RPC calculation; we're just estimating how *likely* it is that the data the user wants is cached. > One idea Tom and I kicked around previously is to set an assumed > caching percentage for each table based on its size relative to > effective_cache_size - in other words, assume that the smaller a table > is, the more of it will be cached. Consider a system with 8GB of RAM, > and a table which is 64kB. It is probably unwise to make any plan > based on the assumption that that table is less than fully cached. If > it isn't before the query executes, it soon will be. Going to any > amount of work elsewhere in the plan to avoid the work of reading that > table in from disk is probably a dumb idea. Of course, one downside > of this approach is that it doesn't know which tables are hot and > which tables are cold, but it would probably still be an improvement > over the status quo. Actually, we *do* have some idea which tables are hot. Or at least, we could. Currently, pg_stats for tables are "timeless"; they just accumulate from the last reset, which has always been a problem in general for monitoring. If we could make top-level table and index stats time-based, even in some crude way, we would know which tables were currently hot. That would also have the benefit of making server performance analysis and autotuning easier. > But DBAs > frequently have a very good idea of which stuff is in cache - they can > make observations over a period of time and then adjust settings and > then observe some more and adjust some more. Agreed. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: cross column correlation ...
2011/2/25 Robert Haas : > 2011/2/25 Cédric Villemain : >>> All that having been said, I think that while Josh is thinking fuzzily >>> about the mathematics of his proposal, the basic idea is pretty >>> sensible. It is not easy - likely not possible - for the system to >>> have a good idea which things will be in some kind of cache at the >>> time the query is executed; it could even change mid-query. The >>> execution of one part of the query could evict from the cache data >>> which some other part of the plan assumed would be cached. But DBAs >>> frequently have a very good idea of which stuff is in cache - they can >>> make observations over a period of time and then adjust settings and >>> then observe some more and adjust some more. >> >> I believe we can maintain a small map of area of a relation which are >> in the OS buffer cache (shared buffers move more), or at least a >> percentage of the relation in OS cache. Getting autovacuum daemon >> being able to update those maps/counters might be enought and easy to >> do, it is really near what auto-analyze do. My observation is that >> the percentage in cache is stable on a production workload after some >> tens of minutes needed to warm the server. > > I don't think we can assume that will be true in all workloads. > Imagine a server doing batch processing. People submit large batches > of work that take, say, an hour to complete. Not all batches use the > same set of tables - maybe they even run in different databases. > After a big batch process finishes crunching numbers in database A, > very little of database B will be cached. But it's not necessarily > right to assume that when we start queries for a new batch in database > B, although it's more likely to be right for large tables (which will > take a long time to get cached meaningfully, if they ever do) than > small ones. Also, it could lead to strange issues where batches run > much faster or slower depending on which batch immediately proceeded > them. If we're going to do something a lot of times, it'd be better > to bite the bullet and read it all in rather than going to more work > elsewhere, but if we're only going to touch it once, then not so much. > > You might also have this issue on systems that run OLTP workloads all > day and then do some batch processing at night to get ready for the > next business day. Kevin Grittner wrote previously about those jobs > needing some different settings in his environment (I'm not > remembering which settings at the moment). Suppose that the batch > process is going to issue a query that can be planned in one of two > possible ways. One way involves reading 10% of a relation, and the > other way involves reading the whole thing. The first plan takes 200 > s to execute if the relation is not cached, and 180 s if the relevant > portion is cached. The second plan takes 300 s to execute if the > relation is not cached, and 100 s if it is cached. At the start of > the batch run, the relation won't be cached, because it's used *only* > by the overnight job and not by the daily OLTP traffic. Which way > should we execute the query? > > The answer is that if the batch job only needs to execute that query > *once*, we should do it the first way. But if it needs to execute it > three or more times, the second way is better, but only if we use the > second plan every time. If we start out with the first plan, we're > always better off sticking with it *unless* we know that we're going > to repeat the query at least twice more after the iteration we're > currently planning. To make the right decision, the query planner > needs a crystal ball. Or, a little help from the DBA. Yes, we are talking of improving some part of the model. Some workloads are dramatic and need special customization. This is true. Still there is a path of improvement, and probably it will remain a path of improvement after the current model is updated. I am not proposing something to solve all the issues, but way more interesting IMHO than just letting the dba say : 'this table is in cache at XX%'. Btw, pgfincore already do solve the usecase you provide by helping the DBA to prepare its batch processing, so in some sense I am familiar with what you describe (take the second plan, pgfincore will preload in the background, and your query will be done in 100s from the first). > >> What should really help here is to have hooks in the cost functions to >> test those ideas without the need to patch postgresql-core a lot. Will >> it be ok to have hooks or will it add to much CPU consumption in a >> sensible part of the code ? > > Depends on where you put them, I guess. Hooks are pretty cheap, but > they're also pretty hard to use. Yes, it will be easier to make an extension, have people testing it and validate or not the 'new' model -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via p
Re: [HACKERS] disposition of remaining patches
On Fri, Feb 25, 2011 at 2:33 PM, Daniel Farina wrote: > I know I got hit by a backend synchronization (in the sense of locks, > etc) bugs; do you think it is possible yours (sending SIGSTOP) could > be the same root cause? I haven't followed all the other bugs cleared > up by inspection. I believe that the queue management logic is just totally busted and needs to be rewritten. I doubt there is much point in speculating about details until that's done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect's infomask and infomask2 as smallint
Excerpts from Tom Lane's message of mar feb 15 12:42:00 -0300 2011: > Heikki Linnakangas writes: > > pageinspect is just a debugging aid, so I think we should change it from > > smallint to int4 in 9.1, and not bother backporting. > > I don't see any reason that the old version of the function couldn't be > dropped in the upgrade script. It's not likely anything would be > depending on it, is it? Okay, so I changed the C code, and there's working upgrade support that takes you from the 9.0 version to the 9.1 version. I tested this by creating a 9.0 database with pageinspect loaded, then pg_upgrade'd it to 9.1, then ran create extension pageinspect from unpackaged; If I run the function before running the CREATE EXTENSION command, it works but still displays the negative numbers. After that, it behaves as expected. I was a bit surprised that I had to remove the ALTER EXTENSION/ADD FUNCTION command from the upgrade script, but in hindsight it makes perfect sense -- those commands are being run inside the "create_extension" context and so the function being created already belongs to the extension. Still, maybe we should make ALTER EXTENSION/ADD idempotent. I considered the idea of calling this version 1.1 and shipping a new pageinspect--1.0--1.1.sql script (which works perfectly, provided you run ALTER EXTENSION/DROP FUNCTION before dropping the function, then ALTER/ADD later), but decided that this was overkill. We can still change it if people thinks that'd be better, of course, but ... Nice work on extensions and their upgradability, overall. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] help: collation support on Windows
On Fri, Feb 25, 2011 at 22:58, Dave Page wrote: > On Fri, Feb 25, 2011 at 9:54 PM, Andrew Dunstan wrote: >> >> >> On 02/25/2011 02:32 PM, Peter Eisentraut wrote: >>> >>> According to the online documentation, the APIs are there: >>> http://msdn.microsoft.com/en-ca/library/a7cwbx4t.aspx >>> >>> Now we'd need someone brave try to make it work. The starting point >>> would be to define HAVE_LOCALE_T and then make it build. Microsoft has >>> all the relevant functions and types with an underscore in front >>> (_strcoll_l, etc.), so some extra #defining will probably be necessary. >>> >>> Also, initdb will need to be patched to get a list of OS locales to >>> populate the pg_collation catalog with. >>> >>> Finally, a regression test customized for Windows, but I can help with >>> that later. >>> >>> >>> >> >> >> What is the equivalent of "locale -a"? > > There isn't a command that I know of, but the API function you > probably need is EnumSystemLocales. Yeah. If you want example code, you can probably lift something from the old pginstaller project on pgfoundry - we use that code to populate the dropdown box for initdb there. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] help: collation support on Windows
On Fri, Feb 25, 2011 at 9:54 PM, Andrew Dunstan wrote: > > > On 02/25/2011 02:32 PM, Peter Eisentraut wrote: >> >> According to the online documentation, the APIs are there: >> http://msdn.microsoft.com/en-ca/library/a7cwbx4t.aspx >> >> Now we'd need someone brave try to make it work. The starting point >> would be to define HAVE_LOCALE_T and then make it build. Microsoft has >> all the relevant functions and types with an underscore in front >> (_strcoll_l, etc.), so some extra #defining will probably be necessary. >> >> Also, initdb will need to be patched to get a list of OS locales to >> populate the pg_collation catalog with. >> >> Finally, a regression test customized for Windows, but I can help with >> that later. >> >> >> > > > What is the equivalent of "locale -a"? There isn't a command that I know of, but the API function you probably need is EnumSystemLocales. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] help: collation support on Windows
On 02/25/2011 02:32 PM, Peter Eisentraut wrote: According to the online documentation, the APIs are there: http://msdn.microsoft.com/en-ca/library/a7cwbx4t.aspx Now we'd need someone brave try to make it work. The starting point would be to define HAVE_LOCALE_T and then make it build. Microsoft has all the relevant functions and types with an underscore in front (_strcoll_l, etc.), so some extra #defining will probably be necessary. Also, initdb will need to be patched to get a list of OS locales to populate the pg_collation catalog with. Finally, a regression test customized for Windows, but I can help with that later. What is the equivalent of "locale -a"? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] help: collation support on Windows
According to the online documentation, the APIs are there: http://msdn.microsoft.com/en-ca/library/a7cwbx4t.aspx Now we'd need someone brave try to make it work. The starting point would be to define HAVE_LOCALE_T and then make it build. Microsoft has all the relevant functions and types with an underscore in front (_strcoll_l, etc.), so some extra #defining will probably be necessary. Also, initdb will need to be patched to get a list of OS locales to populate the pg_collation catalog with. Finally, a regression test customized for Windows, but I can help with that later. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga wrote: >> > I also did some initial testing on this patch and got the queue related > errors with > 1 clients. With the code change from Jaime above I still got a > lot of 'not on queue warnings'. > > I tried to understand how the queue was supposed to work - resulting in the > changes below that also incorporates a suggestion from Fujii upthread, to > early exit when myproc was found. > yes, looking at the code, the warning and your patch... it seems yours is the right solution... I'm compiling right now to test again and see the effects, Robert maybe you can test your failure case again? i'm really sure it's related to this... -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disposition of remaining patches
On Fri, Feb 25, 2011 at 4:43 AM, Robert Haas wrote: > On Fri, Feb 25, 2011 at 3:14 AM, Daniel Farina wrote: >> On Wed, Feb 23, 2011 at 11:49 AM, Greg Smith wrote: >>> Robert Haas wrote: > > 2. Synchronous replication. Splitting up this patch has allowed some >>> On top of 4 listed reviewers I know Dan Farina is poking at the last update, >>> so we may see one more larger report on top of what's already shown up. And >>> Jaime keeps kicking the tires too. What Simon was hoping is that a week of >>> others looking at this would produce enough feedback that it might be >>> possible to sweep the remaining issues up soon after he's back. It looks to >>> me like that's about when everything else that's still open will probably >>> settle too. >> >> Besides some of the fixable issues, I am going to have to echo >> Robert's sentiments about a few kinks that go beyond mechanism in the >> syncrep patch: in particular, it will *almost* solve the use case I >> was hoping to solve: a way to cleanly perform planned switchovers >> between machines with minimal downtime and no lost data. But there are >> a couple of holes I have thought of so far: > > Well, just because the patch doesn't solve every use case isn't a > reason not to go forward with it - we can always add more options > later - but I have to admit that I'm kind of alarmed about the number > of bugs reported so far. True: the relevance of any use case to acceptance is up to some debate. I haven't thought about how to remedy this, just thinking aloud about a problem I would have as-is, and is important to me. It is true that later accretion of options can occur, but sometimes the initial choice of semantics can make growing those easier or harder. I haven't yet thought ahead as to how the current scheme would impact that. I know I got hit by a backend synchronization (in the sense of locks, etc) bugs; do you think it is possible yours (sending SIGSTOP) could be the same root cause? I haven't followed all the other bugs cleared up by inspection. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: cross column correlation ...
Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: > How practical would it be for analyze to keep a record of response times for > given sections of a table as it randomly accesses them and generate some > kind of a map for expected response times for the pieces of data it is > analysing? I think what you want is random_page_cost that can be tailored per tablespace. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disposition of remaining patches
On Fri, Feb 25, 2011 at 5:25 AM, marcin mank wrote: > On Fri, Feb 25, 2011 at 9:14 AM, Daniel Farina wrote: >> >> Right now, as it stands, the syncrep patch will be happy as soon as >> the data has been fsynced to either B or A-prime; I don't think we can >> guarantee at any point that A-prime can become the leader, and feed B. >> > > - start A` up, replicating from A > - shutdown B (now A nad A` are synchronous) > now real quick: > - shut down A > - shut down A` > -change configuration > -start up A` > -start up B > > Doesn`t this work? This dance does work, but it would be very nice to not have to take the standby ('B' in my case) offline. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
Greg Stark writes: > On Fri, Feb 25, 2011 at 2:58 PM, Tom Lane wrote: >> However, the real reason for doing it isn't any of those, but rather >> to establish the principle that the executions of the modifying >> sub-queries are interleaved not sequential. > Does the interleaved execution have sane semantics? Depends on what you call sane. With the decision to not increment command counter, it's already the case that people shouldn't have two subqueries try to modify the same row. > With a query like: > WITH > a as update x set x.i=x.i+1 returning x.i, > b as update x set x.i=x.i+1 returning x.i > select * from a natural join b; > Is there any way to tell what it will return or what state it will > leave the table in? My reaction to that is "you shouldn't do that, and you definitely shouldn't complain if it's not predictable whether a or b will modify a given row". This is exactly the sort of assumption I don't want people building into their queries, because we will be locked into purely sequential execution if we promise that the results will be consistent. There is already precedent for that position. You can easily construct queries using UPDATE ... FROM wherein the same target row joins to more than one row in the FROM table, and then it's unpredictable which joining row will be used to update that target row. Our position has always been "don't do that", not that we'd lobotomize the planner and executor to ensure predictability. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: cross column correlation ...
2011/2/25 Cédric Villemain : >> All that having been said, I think that while Josh is thinking fuzzily >> about the mathematics of his proposal, the basic idea is pretty >> sensible. It is not easy - likely not possible - for the system to >> have a good idea which things will be in some kind of cache at the >> time the query is executed; it could even change mid-query. The >> execution of one part of the query could evict from the cache data >> which some other part of the plan assumed would be cached. But DBAs >> frequently have a very good idea of which stuff is in cache - they can >> make observations over a period of time and then adjust settings and >> then observe some more and adjust some more. > > I believe we can maintain a small map of area of a relation which are > in the OS buffer cache (shared buffers move more), or at least a > percentage of the relation in OS cache. Getting autovacuum daemon > being able to update those maps/counters might be enought and easy to > do, it is really near what auto-analyze do. My observation is that > the percentage in cache is stable on a production workload after some > tens of minutes needed to warm the server. I don't think we can assume that will be true in all workloads. Imagine a server doing batch processing. People submit large batches of work that take, say, an hour to complete. Not all batches use the same set of tables - maybe they even run in different databases. After a big batch process finishes crunching numbers in database A, very little of database B will be cached. But it's not necessarily right to assume that when we start queries for a new batch in database B, although it's more likely to be right for large tables (which will take a long time to get cached meaningfully, if they ever do) than small ones. Also, it could lead to strange issues where batches run much faster or slower depending on which batch immediately proceeded them. If we're going to do something a lot of times, it'd be better to bite the bullet and read it all in rather than going to more work elsewhere, but if we're only going to touch it once, then not so much. You might also have this issue on systems that run OLTP workloads all day and then do some batch processing at night to get ready for the next business day. Kevin Grittner wrote previously about those jobs needing some different settings in his environment (I'm not remembering which settings at the moment). Suppose that the batch process is going to issue a query that can be planned in one of two possible ways. One way involves reading 10% of a relation, and the other way involves reading the whole thing. The first plan takes 200 s to execute if the relation is not cached, and 180 s if the relevant portion is cached. The second plan takes 300 s to execute if the relation is not cached, and 100 s if it is cached. At the start of the batch run, the relation won't be cached, because it's used *only* by the overnight job and not by the daily OLTP traffic. Which way should we execute the query? The answer is that if the batch job only needs to execute that query *once*, we should do it the first way. But if it needs to execute it three or more times, the second way is better, but only if we use the second plan every time. If we start out with the first plan, we're always better off sticking with it *unless* we know that we're going to repeat the query at least twice more after the iteration we're currently planning. To make the right decision, the query planner needs a crystal ball. Or, a little help from the DBA. > What should really help here is to have hooks in the cost functions to > test those ideas without the need to patch postgresql-core a lot. Will > it be ok to have hooks or will it add to much CPU consumption in a > sensible part of the code ? Depends on where you put them, I guess. Hooks are pretty cheap, but they're also pretty hard to use. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
On Fri, Feb 25, 2011 at 11:31 AM, Greg Stark wrote: > On Fri, Feb 25, 2011 at 2:58 PM, Tom Lane wrote: >> However, the real reason for doing it isn't any of those, but rather >> to establish the principle that the executions of the modifying >> sub-queries are interleaved not sequential. We're never going to be >> able to do any significant optimization of such queries if we have to >> preserve the behavior that the sub-queries execute sequentially. >> And I think it's inevitable that users will manage to build such an >> assumption into their queries if the first release with the feature >> behaves that way. > > Does the interleaved execution have sane semantics? > > With a query like: > > WITH > a as update x set x.i=x.i+1 returning x.i, > b as update x set x.i=x.i+1 returning x.i > select * from a natural join b; > > Is there any way to tell what it will return or what state it will > leave the table in? WITH a as update x set x.i=x.i+1 returning x.i, b as update x set x.i=x.i+1 where x.i = 1 returning x.i select * from a natural join b; or the above if x is.i is 1 for all x on query start? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
On Fri, Feb 25, 2011 at 2:58 PM, Tom Lane wrote: > However, the real reason for doing it isn't any of those, but rather > to establish the principle that the executions of the modifying > sub-queries are interleaved not sequential. We're never going to be > able to do any significant optimization of such queries if we have to > preserve the behavior that the sub-queries execute sequentially. > And I think it's inevitable that users will manage to build such an > assumption into their queries if the first release with the feature > behaves that way. Does the interleaved execution have sane semantics? With a query like: WITH a as update x set x.i=x.i+1 returning x.i, b as update x set x.i=x.i+1 returning x.i select * from a natural join b; Is there any way to tell what it will return or what state it will leave the table in? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
Marko Tiikkaja writes: > On 2011-02-25 6:12 PM, Tom Lane wrote: >> The current implementation with everything in one plantree >> really ought to look just like a SELECT so far as the portal code >> is concerned. > The problem was that the old code was using PORTAL_MULTI_QUERY whenever > a wCTE was present. Are you saying that you are using > PORTAL_ONE_SELECT? Doesn't that have problems with triggers, for example? Hmmm ... good question. I notice the lack of any regression test cases involving triggers. Will check this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: cross column correlation ...
> 4. Even if we could accurately estimate the percentage of the table > that is cached, what then? For example, suppose that a user issues a > query which retrieves 1% of a table, and we know that 1% of that table > is cached. How much of the data that the user asked for is cache? > Hard to say, right? It could be none of it or all of it. The second > scenario is easy to imagine - just suppose the query's been executed > twice. The first scenario isn't hard to imagine either. > > I have a set of slow disks which can impact performance nearly as much as in cached in memory versus the fast disks. How practical would it be for analyze to keep a record of response times for given sections of a table as it randomly accesses them and generate some kind of a map for expected response times for the pieces of data it is analysing? It may well discover, on it's own, that recent data (1 month old or less) has a random read response time of N, older data (1 year old) in a different section of the relation tends to have a response time of 1000N, and really old data (5 year old) tends to have a response time of 3000N.
Re: [HACKERS] WIP: cross column correlation ...
2011/2/25 Robert Haas : > On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian wrote: >> Josh Berkus wrote: >>> On 2/23/11 7:10 AM, Robert Haas wrote: >>> > IME, most bad query plans are caused by either incorrect >>> > estimates of selectivity, or wrongheaded notions about what's likely >>> > to be cached. If we could find a way, automated or manual, of >>> > providing the planner some better information about the facts of life >>> > in those areas, I think we'd be way better off. I'm open to ideas >>> > about what the best way to do that is. >>> >>> As previously discussed, I'm fine with approaches which involve >>> modifying database objects. These are auditable and centrally managable >>> and aren't devastating to upgrades. >>> >>> So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way >>> that decorating queries would not. >>> >>> Similiarly, I would love to be able to set "cache %" on a per-relation >>> basis, and override the whole dubious calculation involving >>> random_page_cost for scans of that table. >> >> We should just fine a way of checking what percentage of a table is >> already in the shared buffers. That doesn't help us with the kernel >> cache, but it would be a good start and something that doesn't require >> user tuning. > > You're reinventing a wheel that's already been discarded multiple > times. There are at least four separate problems: > > 1. The percentage of the table which is cached in shared_buffers at > plan time need not match the percentage that is cached at execution > time. A delay of even a few seconds between planning and execution > could make the numbers totally different, and plans can be cached for > much longer than that. > > 2. Because shared_buffers can turn over quite quickly, planning the > statement multiple times in relatively quick succession could give > different results each time. Previous discussions on this topic have > concluded that DBAs hate plan instability, and hate GEQO because it > causes plan instability, and this would inject plan instabiilty into > the main planner. > > 3. The percentage of the table which is cached in shared_buffers is > not necessarily representative of the percentage which is cached in > general. On a large machine, shared_buffers may be less than 10% of > the total cache. It would be unwise to make guesses about what is and > is not cached based on a small percentage of the cache. > > 4. Even if we could accurately estimate the percentage of the table > that is cached, what then? For example, suppose that a user issues a > query which retrieves 1% of a table, and we know that 1% of that table > is cached. How much of the data that the user asked for is cache? > Hard to say, right? It could be none of it or all of it. The second > scenario is easy to imagine - just suppose the query's been executed > twice. The first scenario isn't hard to imagine either. > > One idea Tom and I kicked around previously is to set an assumed > caching percentage for each table based on its size relative to > effective_cache_size - in other words, assume that the smaller a table > is, the more of it will be cached. Consider a system with 8GB of RAM, > and a table which is 64kB. It is probably unwise to make any plan > based on the assumption that that table is less than fully cached. If > it isn't before the query executes, it soon will be. Going to any > amount of work elsewhere in the plan to avoid the work of reading that > table in from disk is probably a dumb idea. Of course, one downside > of this approach is that it doesn't know which tables are hot and > which tables are cold, but it would probably still be an improvement > over the status quo. Yes, good idea. > > All that having been said, I think that while Josh is thinking fuzzily > about the mathematics of his proposal, the basic idea is pretty > sensible. It is not easy - likely not possible - for the system to > have a good idea which things will be in some kind of cache at the > time the query is executed; it could even change mid-query. The > execution of one part of the query could evict from the cache data > which some other part of the plan assumed would be cached. But DBAs > frequently have a very good idea of which stuff is in cache - they can > make observations over a period of time and then adjust settings and > then observe some more and adjust some more. I believe we can maintain a small map of area of a relation which are in the OS buffer cache (shared buffers move more), or at least a percentage of the relation in OS cache. Getting autovacuum daemon being able to update those maps/counters might be enought and easy to do, it is really near what auto-analyze do. My observation is that the percentage in cache is stable on a production workload after some tens of minutes needed to warm the server. What should really help here is to have hooks in the cost functions to test those ideas without the need to patch postgresql-core a lot
Re: [HACKERS] sync rep design architecture (was "disposition of remaining patches")
Daniel Farina wrote: Server A syncreps to Server B Now I want to provision server A-prime, which will eventually take the place of A. Server A syncreps to Server B Server A syncreps to Server A-prime Right now, as it stands, the syncrep patch will be happy as soon as the data has been fsynced to either B or A-prime; I don't think we can guarantee at any point that A-prime can become the leader, and feed B. One of the very fundamental breaks between how this patch implements sync rep and what some people might expect is this concern. Having such tight control over the exact order of failover isn't quite here yet, so sometimes people will need to be creative to work within the restrictions of what is available. The path for this case is probably: 1) Wait until A' is caught up 2) Switchover to B as the right choice to be the new master, with A' as its standby and A going off-line at the same time. 3) Switchover the master role from B to A'. Bring up B as its standby. There are other possible transition plans available too. I appreciate that you would like to do this as an atomic operation, rather than handling it as two steps--one of which puts you in a middle point where B, a possibly inferior standby, is operating at the master. There are a dozen other complicated "my use case says I want and it must be done as " requests for Sync Rep floating around here, too. They're all getting ignored in favor of something smaller that can get built today. The first question I'd ask is whether you could you settle for this more cumbersome than you'd prefer switchover plan for now. The second is whether implementing what this feature currently does would get in the way of coding of what you really want eventually. I didn't get the Streaming Rep + Hot Standby features I wanted in 9.0 either. But committing what was reasonable to include in that version let me march forward with very useful new code, doing another year of development on my own projects and getting some new things get fixed in core. And so far it looks like 9.1 will sort out all of the kinks I was unhappy about. The same sort of thing will need to happen to get Sync Rep committed and then appropriate for more use cases. There isn't any margin left for discussions of scope creep left here; really it's "is this subset useful for some situations and stable enough to commit" now. 2. The unprivileged user can disable syncrep, in any situation. This flexibility is *great*, but you don't really want people to do it when one is performing the switchover. For the moment you may have to live with a situation where user connections must be blocked during the brief moment of switchover to eliminate this issue. That's what I end up doing with 9.0 production systems to get a really clean switchover, there's a second of hiccup even in the best case. I'm not sure yet of the best way yet to build a UI to make that more transparent in the sync rep case. It's sure not a problem that's going to get solved in this release though. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
On 2011-02-25 6:12 PM, Tom Lane wrote: Marko Tiikkaja writes: Without hacking it broke when PQdescribePrepared was called on a prepared query like: WITH t AS (DELETE FROM foo) SELECT 1; Not sure if that's an actual problem, but it seemed like something worht fixing. I can't replicate such a problem here --- do you have a concrete test case? ISTM the issue would only have been a problem back when you were trying to generate multiple PlannedStmts from a query like the above. I don't have one right now (I lost the one I had because of a hardware failure in a virtual machine), but I can write you one if you want to. But see below. The current implementation with everything in one plantree really ought to look just like a SELECT so far as the portal code is concerned. The problem was that the old code was using PORTAL_MULTI_QUERY whenever a wCTE was present. Are you saying that you are using PORTAL_ONE_SELECT? Doesn't that have problems with triggers, for example? Also, why are we forbidding wCTEs in cursors? Given the current definitions, that case seems to work fine too: the wCTEs will be executed as soon as you fetch something from the cursor. Are you just worried about not allowing a case that might be hard to support later? Honestly, I have no idea. It might be a leftover from the previous design. If it looks like it's easy to support, then go for it. Right now I'm thinking that it is best to continue to forbid it. If we go over to the less-sequential implementation that I'm advocating in another thread, the timing of the updates would become a lot less predictable than I say above. If we refuse it for now, we can always remove the restriction later, but the other way is more painful. Fair enough. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE behaviour
Marko Tiikkaja writes: > On 2011-02-25 1:36 AM, Tom Lane wrote: >> Why is it necessary to hack the portal logic at all? The patch seems to >> work for me without that. (I've fixed quite a few bugs though, so maybe >> what this is really doing is masking a problem elsewhere.) > Without hacking it broke when PQdescribePrepared was called on a > prepared query like: > WITH t AS (DELETE FROM foo) > SELECT 1; > Not sure if that's an actual problem, but it seemed like something worht > fixing. I can't replicate such a problem here --- do you have a concrete test case? ISTM the issue would only have been a problem back when you were trying to generate multiple PlannedStmts from a query like the above. The current implementation with everything in one plantree really ought to look just like a SELECT so far as the portal code is concerned. >> Also, why are we forbidding wCTEs in cursors? Given the current >> definitions, that case seems to work fine too: the wCTEs will be >> executed as soon as you fetch something from the cursor. Are you >> just worried about not allowing a case that might be hard to support >> later? > Honestly, I have no idea. It might be a leftover from the previous > design. If it looks like it's easy to support, then go for it. Right now I'm thinking that it is best to continue to forbid it. If we go over to the less-sequential implementation that I'm advocating in another thread, the timing of the updates would become a lot less predictable than I say above. If we refuse it for now, we can always remove the restriction later, but the other way is more painful. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
On 2011-02-25 4:58 PM, Tom Lane wrote: Specifically, I'm imagining getting rid of the patch's additions to InitPlan and ExecutePlan that find all the modifying sub-queries and force them to be cycled to completion before the main plan runs. Just run the main plan and let it pull tuples from the CTEs as needed. Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to completion before shutting down the plan. (In the event of an error, we'd never get to ExecutorEnd, but it doesn't matter since whatever updates we did apply are nullified anyhow.) This idea has actually been discussed before when we talked about optimizing wCTEs, but IIRC you said that doing this in ExecutorEnd is a bit ugly. But if you can write this idea down in a way that makes you happy with the implementation, I think it's a huge benefit and we should definitely do it. This has a number of immediate and future implementation benefits: 3. The code could be significantly simpler. Instead of that rather complex and fragile logic in InitPlan to try to locate all the ModifyTable nodes and their CTEScan parents, we could just have ModifyTable nodes add themselves to a list in the EState during ExecInitNode. Then ExecutorEnd just traverses that list. Sounds good to me. However, the real reason for doing it isn't any of those, but rather to establish the principle that the executions of the modifying sub-queries are interleaved not sequential. We're never going to be able to do any significant optimization of such queries if we have to preserve the behavior that the sub-queries execute sequentially. And I think it's inevitable that users will manage to build such an assumption into their queries if the first release with the feature behaves that way. Yeah, you might be right. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
"Kevin Grittner" writes: > Tom Lane wrote: >> Given that we've decided to run the modifying sub-queries all with >> the same command counter ID, they are logically executing "in >> parallel". >> Just run the main plan and let it pull tuples from the CTEs as >> needed. > On the face of it, that sounds like it has another benefit you > didn't mention -- it sounds like it's much more conducive to > allowing parallel processing, if (when?) we eventually move in that > direction. It might even be a good case for an initial, limited > implementation. Yeah. Most of the executor is in principle parallelizable at the plan-node level (ignoring the obvious and severe implementation problems with parallelizing *anything* in the backend). It's not good for wCTE to be creating a user-visible assumption that certain things will happen in a predefined order. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep v17
On 2011-02-22 20:43, Jaime Casanova wrote: you can make this happen more easily, i just run "pgbench -n -c10 -j10 test" and qot that warning and sometimes a segmentation fault and sometimes a failed assertion and the problematic code starts at src/backend/replication/syncrep.c:277, here my suggestions on that code. still i get a failed assertion because of the second Assert (i think we should just remove that one) *** SyncRepRemoveFromQueue(void) *** 288,299 if (proc->lwWaitLink == NULL) elog(WARNING, "could not locate ourselves on wait queue"); ! proc = proc->lwWaitLink; } if (proc->lwWaitLink == NULL) /* At tail */ { ! Assert(proc == MyProc); /* Remove ourselves from tail of queue */ Assert(queue->tail == MyProc); queue->tail = proc; --- 288,300 if (proc->lwWaitLink == NULL) elog(WARNING, "could not locate ourselves on wait queue"); ! else ! proc = proc->lwWaitLink; } if (proc->lwWaitLink == NULL) /* At tail */ { ! Assert(proc != MyProc); /* Remove ourselves from tail of queue */ Assert(queue->tail == MyProc); queue->tail = proc; I also did some initial testing on this patch and got the queue related errors with > 1 clients. With the code change from Jaime above I still got a lot of 'not on queue warnings'. I tried to understand how the queue was supposed to work - resulting in the changes below that also incorporates a suggestion from Fujii upthread, to early exit when myproc was found. With the changes below all seems to work without warnings. I now see that the note about the list invariant is too short, better was: "if queue length = 1 then head = tail" --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void) } else { + bool found = false; + while (proc->lwWaitLink != NULL) { /* Are we the next proc in our traversal of the queue? */ @@ -284,17 +286,19 @@ SyncRepRemoveFromQueue(void) * No need to touch head or tail. */ proc->lwWaitLink = MyProc->lwWaitLink; + found = true; + break; } - if (proc->lwWaitLink == NULL) - elog(WARNING, "could not locate ourselves on wait queue"); proc = proc->lwWaitLink; } + if (!found) + elog(WARNING, "could not locate ourselves on wait queue"); - if (proc->lwWaitLink == NULL) /* At tail */ + /* If MyProc was removed from the tail, maintain list invariant head==tail */ + if (proc->lwWaitLink == NULL) { - Assert(proc == MyProc); - /* Remove ourselves from tail of queue */ + Assert(proc != MyProc); /* impossible since that is the head=MyProc branch above */ Assert(queue->tail == MyProc); queue->tail = proc; proc->lwWaitLink = NULL; I needed to add this to make the documentation compile --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF; You should also consider setting hot_standby_feedback as an alternative to using this parameter. + + + regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
Tom Lane wrote: > Given that we've decided to run the modifying sub-queries all with > the same command counter ID, they are logically executing "in > parallel". > Just run the main plan and let it pull tuples from the CTEs as > needed. On the face of it, that sounds like it has another benefit you didn't mention -- it sounds like it's much more conducive to allowing parallel processing, if (when?) we eventually move in that direction. It might even be a good case for an initial, limited implementation. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
On Fri, Feb 25, 2011 at 9:58 AM, Tom Lane wrote: > I had what seems to me a remarkably good idea, though maybe someone else > can spot a problem with it. Given that we've decided to run the > modifying sub-queries all with the same command counter ID, they are > logically executing "in parallel". The current implementation takes no > advantage of that fact, though: it's based around the idea of running > the updates strictly sequentially. I think we should change it so that > the updates happen physically, not only logically, concurrently. > Specifically, I'm imagining getting rid of the patch's additions to > InitPlan and ExecutePlan that find all the modifying sub-queries and > force them to be cycled to completion before the main plan runs. > Just run the main plan and let it pull tuples from the CTEs as needed. > Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to > completion before shutting down the plan. (In the event of an error, > we'd never get to ExecutorEnd, but it doesn't matter since whatever > updates we did apply are nullified anyhow.) > > This has a number of immediate and future implementation benefits: > > 1. RETURNING tuples that aren't actually needed by the main plan > don't need to be buffered anywhere. (ExecutorEnd would just pull > directly from the ModifyTable nodes, ignoring their parent CTE > nodes, in all cases.) > > 2. In principle, in many common cases the RETURNING tuples wouldn't have > to be buffered at all, but could be consumed on-the-fly. I think that > right now the CTEScan nodes might still buffer the tuples so they can > regurgitate them in case of being rescanned, but it's not hard to see > how that could be improved later if it doesn't work immediately. > > 3. The code could be significantly simpler. Instead of that rather > complex and fragile logic in InitPlan to try to locate all the > ModifyTable nodes and their CTEScan parents, we could just have > ModifyTable nodes add themselves to a list in the EState during > ExecInitNode. Then ExecutorEnd just traverses that list. > > However, the real reason for doing it isn't any of those, but rather > to establish the principle that the executions of the modifying > sub-queries are interleaved not sequential. We're never going to be > able to do any significant optimization of such queries if we have to > preserve the behavior that the sub-queries execute sequentially. > And I think it's inevitable that users will manage to build such an > assumption into their queries if the first release with the feature > behaves that way. > > Comments? I completely agree. Actually, I thought we had already agreed on the design you just proposed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?
I had what seems to me a remarkably good idea, though maybe someone else can spot a problem with it. Given that we've decided to run the modifying sub-queries all with the same command counter ID, they are logically executing "in parallel". The current implementation takes no advantage of that fact, though: it's based around the idea of running the updates strictly sequentially. I think we should change it so that the updates happen physically, not only logically, concurrently. Specifically, I'm imagining getting rid of the patch's additions to InitPlan and ExecutePlan that find all the modifying sub-queries and force them to be cycled to completion before the main plan runs. Just run the main plan and let it pull tuples from the CTEs as needed. Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to completion before shutting down the plan. (In the event of an error, we'd never get to ExecutorEnd, but it doesn't matter since whatever updates we did apply are nullified anyhow.) This has a number of immediate and future implementation benefits: 1. RETURNING tuples that aren't actually needed by the main plan don't need to be buffered anywhere. (ExecutorEnd would just pull directly from the ModifyTable nodes, ignoring their parent CTE nodes, in all cases.) 2. In principle, in many common cases the RETURNING tuples wouldn't have to be buffered at all, but could be consumed on-the-fly. I think that right now the CTEScan nodes might still buffer the tuples so they can regurgitate them in case of being rescanned, but it's not hard to see how that could be improved later if it doesn't work immediately. 3. The code could be significantly simpler. Instead of that rather complex and fragile logic in InitPlan to try to locate all the ModifyTable nodes and their CTEScan parents, we could just have ModifyTable nodes add themselves to a list in the EState during ExecInitNode. Then ExecutorEnd just traverses that list. However, the real reason for doing it isn't any of those, but rather to establish the principle that the executions of the modifying sub-queries are interleaved not sequential. We're never going to be able to do any significant optimization of such queries if we have to preserve the behavior that the sub-queries execute sequentially. And I think it's inevitable that users will manage to build such an assumption into their queries if the first release with the feature behaves that way. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disposition of remaining patches
On Fri, Feb 25, 2011 at 9:14 AM, Daniel Farina wrote: > > Right now, as it stands, the syncrep patch will be happy as soon as > the data has been fsynced to either B or A-prime; I don't think we can > guarantee at any point that A-prime can become the leader, and feed B. > - start A` up, replicating from A - shutdown B (now A nad A` are synchronous) now real quick: - shut down A - shut down A` -change configuration -start up A` -start up B Doesn`t this work? Greetings Marcin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: cross column correlation ...
On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian wrote: > Josh Berkus wrote: >> On 2/23/11 7:10 AM, Robert Haas wrote: >> > IME, most bad query plans are caused by either incorrect >> > estimates of selectivity, or wrongheaded notions about what's likely >> > to be cached. If we could find a way, automated or manual, of >> > providing the planner some better information about the facts of life >> > in those areas, I think we'd be way better off. I'm open to ideas >> > about what the best way to do that is. >> >> As previously discussed, I'm fine with approaches which involve >> modifying database objects. These are auditable and centrally managable >> and aren't devastating to upgrades. >> >> So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way >> that decorating queries would not. >> >> Similiarly, I would love to be able to set "cache %" on a per-relation >> basis, and override the whole dubious calculation involving >> random_page_cost for scans of that table. > > We should just fine a way of checking what percentage of a table is > already in the shared buffers. That doesn't help us with the kernel > cache, but it would be a good start and something that doesn't require > user tuning. You're reinventing a wheel that's already been discarded multiple times. There are at least four separate problems: 1. The percentage of the table which is cached in shared_buffers at plan time need not match the percentage that is cached at execution time. A delay of even a few seconds between planning and execution could make the numbers totally different, and plans can be cached for much longer than that. 2. Because shared_buffers can turn over quite quickly, planning the statement multiple times in relatively quick succession could give different results each time. Previous discussions on this topic have concluded that DBAs hate plan instability, and hate GEQO because it causes plan instability, and this would inject plan instabiilty into the main planner. 3. The percentage of the table which is cached in shared_buffers is not necessarily representative of the percentage which is cached in general. On a large machine, shared_buffers may be less than 10% of the total cache. It would be unwise to make guesses about what is and is not cached based on a small percentage of the cache. 4. Even if we could accurately estimate the percentage of the table that is cached, what then? For example, suppose that a user issues a query which retrieves 1% of a table, and we know that 1% of that table is cached. How much of the data that the user asked for is cache? Hard to say, right? It could be none of it or all of it. The second scenario is easy to imagine - just suppose the query's been executed twice. The first scenario isn't hard to imagine either. One idea Tom and I kicked around previously is to set an assumed caching percentage for each table based on its size relative to effective_cache_size - in other words, assume that the smaller a table is, the more of it will be cached. Consider a system with 8GB of RAM, and a table which is 64kB. It is probably unwise to make any plan based on the assumption that that table is less than fully cached. If it isn't before the query executes, it soon will be. Going to any amount of work elsewhere in the plan to avoid the work of reading that table in from disk is probably a dumb idea. Of course, one downside of this approach is that it doesn't know which tables are hot and which tables are cold, but it would probably still be an improvement over the status quo. All that having been said, I think that while Josh is thinking fuzzily about the mathematics of his proposal, the basic idea is pretty sensible. It is not easy - likely not possible - for the system to have a good idea which things will be in some kind of cache at the time the query is executed; it could even change mid-query. The execution of one part of the query could evict from the cache data which some other part of the plan assumed would be cached. But DBAs frequently have a very good idea of which stuff is in cache - they can make observations over a period of time and then adjust settings and then observe some more and adjust some more. PostgreSQL is extremely easy to administer compared with some of its competitors, and it's frequently necessary to change very little. But there's a difference between what you absolutely have to change to make it work and what you have the option to change when necessary. We need to decrease the amount of stuff in the first category (as we recently did with wal_buffers) and increase the amount of stuff in the second category. People coming from Oracle are not favorably impressed either by the amount of monitoring data PostgreSQL can gather or by the number of knobs that are available to fix problems when they occur. We don't need to have as many knobs as Oracle and we probably don't want to, and for that matter we probably couldn't if we did w
Re: [HACKERS] disposition of remaining patches
On Fri, Feb 25, 2011 at 3:14 AM, Daniel Farina wrote: > On Wed, Feb 23, 2011 at 11:49 AM, Greg Smith wrote: >> Robert Haas wrote: 2. Synchronous replication. Splitting up this patch has allowed some >> On top of 4 listed reviewers I know Dan Farina is poking at the last update, >> so we may see one more larger report on top of what's already shown up. And >> Jaime keeps kicking the tires too. What Simon was hoping is that a week of >> others looking at this would produce enough feedback that it might be >> possible to sweep the remaining issues up soon after he's back. It looks to >> me like that's about when everything else that's still open will probably >> settle too. > > Besides some of the fixable issues, I am going to have to echo > Robert's sentiments about a few kinks that go beyond mechanism in the > syncrep patch: in particular, it will *almost* solve the use case I > was hoping to solve: a way to cleanly perform planned switchovers > between machines with minimal downtime and no lost data. But there are > a couple of holes I have thought of so far: Well, just because the patch doesn't solve every use case isn't a reason not to go forward with it - we can always add more options later - but I have to admit that I'm kind of alarmed about the number of bugs reported so far. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disposition of remaining patches
On Wed, Feb 23, 2011 at 11:49 AM, Greg Smith wrote: > Robert Haas wrote: >>> >>> 2. Synchronous replication. Splitting up this patch has allowed some > On top of 4 listed reviewers I know Dan Farina is poking at the last update, > so we may see one more larger report on top of what's already shown up. And > Jaime keeps kicking the tires too. What Simon was hoping is that a week of > others looking at this would produce enough feedback that it might be > possible to sweep the remaining issues up soon after he's back. It looks to > me like that's about when everything else that's still open will probably > settle too. Besides some of the fixable issues, I am going to have to echo Robert's sentiments about a few kinks that go beyond mechanism in the syncrep patch: in particular, it will *almost* solve the use case I was hoping to solve: a way to cleanly perform planned switchovers between machines with minimal downtime and no lost data. But there are a couple of holes I have thought of so far: 1. The 2-safe methodology supported is not really compatible with performing planned-HA-switchover of a cluster with its own syncrep guarantees on top of that. For example: Server A syncreps to Server B Now I want to provision server A-prime, which will eventually take the place of A. Server A syncreps to Server B Server A syncreps to Server A-prime Right now, as it stands, the syncrep patch will be happy as soon as the data has been fsynced to either B or A-prime; I don't think we can guarantee at any point that A-prime can become the leader, and feed B. 2. The unprivileged user can disable syncrep, in any situation. This flexibility is *great*, but you don't really want people to do it when one is performing the switchover. Rather, in a magical world we'd hope that disabling syncrep would just result in not having to synchronously commit to B (but, in this case, still synchronously commit to A-prime) In other words, to my mind, you can use syncrep as-is to provide 2-safe durability xor a scheduled switchover: as soon as someone wants both, I think they'll have some trouble. I do want both, though. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers