date:20110225

Re: [HACKERS] Keywords in pg_hba.conf should be field-specific

2011-02-25 Thread Bruce Momjian


Any progress on this?

---

Brendan Jurd wrote:
> On 18 October 2010 01:19, Tom Lane  wrote:
> > Brendan Jurd  writes:
> >> On 17 October 2010 09:59, Tom Lane  wrote:
> >>> Good point. ?Maybe the correct fix is to remember whether each token was
> >>> quoted or not, so that keyword detection can be done safely after the
> >>> initial lexing. ?I still think that the current method is impossibly
> >>> ugly ...
> >
> >> I'm happy to revise the patch on that basis. ?Any suggestions about
> >> how to communicate the 'quotedness' of each token? ?We could make each
> >> token a struct consisting of the token itself, plus a boolean flag to
> >> indicate whether it had been quoted. ?Does that work for you?
> >
> > Seems reasonable. ?I had the idea of a parallel list of booleans in the
> > back of my mind, but a list of structs is probably easier to understand,
> > and to extend further if necessary.
> >
> 
> Okay, I've taken the red pill and I'm finding out how deep the rabbit
> hole goes ...
> 
> The logical structure of pg_hba.conf is a set of lines, each line
> containing a set of fields, each field containing a set of tokens.
> The way the existing implementation handles this is to create a list
> of lines containing sublists of fields, containing comma-separated
> strings for the set of tokens, with newlines embedded next to tokens
> which might be keywords.
> 
> The tokeniser breaks apart the comma-separated tokens ... and then
> reassembles them into a comma-separated string.  Which the db/role
> matching functions then have to break apart *again*.
> 
> In order to keep track of whether each individual token was quoted, I
> first need to impose some sanity here.  Rather than using a magical
> string for each field, I intend to use a List of HbaToken structs
> which explicitly note whether quoting was used.
> 
> Introducing an extra List level does mean a bit more work copying and
> freeing, and it makes the patch really quite intrusive.  I have to
> touch a lot of lines in hba.c, but I think the additional clarity is
> worth it.  If nobody dissuades me from this approach I hope to post a
> patch in a couple of days.
> 
> Cheers,
> BJ
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Bruce Momjian

Robert Haas wrote:
> > Actually, we *do* have some idea which tables are hot. ?Or at least, we
> > could. ? Currently, pg_stats for tables are "timeless"; they just
> > accumulate from the last reset, which has always been a problem in
> > general for monitoring. ?If we could make top-level table and index
> > stats time-based, even in some crude way, we would know which tables
> > were currently hot. ?That would also have the benefit of making server
> > performance analysis and autotuning easier.
> 
> I think there would be value in giving the DBA an easier way to see
> which tables are hot, but I am really leery about the idea of trying
> to feed that directly into the query planner.  I think this is one of
> those cases where we let people tune it manually for starters, and
> then wait for feedback.  Eventually someone will say "oh, I never tune
> that by hand any more, ever since I wrote this script which does the
> following computation... and I just run it out cron".  And then we
> will get out the party hats.  But we will never get the experience we
> need to say what that auto-tuning algorithm will be unless we first
> provide the knob for someone to fiddle with manually.

It is also possible we will implement a manual way and never get around
to automating it.   :-(

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO: You can alter it, but you can't view it

2011-02-25 Thread Bruce Momjian

Josh Berkus wrote:
> 
> > Right now pg_options_to_table() is not documented.  Should it be?
> 
> Yes, I think so.

Done, with the attached, applied patch.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 736eb67..c620142 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SELECT pg_type_is_visible('myschema.widg
*** 13244,13249 
--- 13244,13253 
 
  
 
+ pg_options_to_table
+
+ 
+
  pg_tablespace_databases
 
  
*** SELECT pg_type_is_visible('myschema.widg
*** 13380,13385 
--- 13384,13394 
 get underlying SELECT command for view


+pg_options_to_table(reloptions)
+name, option
+get the set of option name/value pairs from pg_class.reloptions
+   
+   
 pg_tablespace_databases(tablespace_oid)
 setof oid
 get the set of database OIDs that have objects in the tablespace
*** SELECT pg_type_is_visible('myschema.widg
*** 13475,13480 
--- 13484,13495 

  

+pg_options_to_table returns the set of option
+name/value pairs when passed
+pg_class.reloptions.
+   
+ 
+   
 pg_tablespace_databases allows a tablespace to be
 examined. It returns the set of OIDs of databases that have objects stored
 in the tablespace. If this function returns any rows, the tablespace is not

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

Marko Tiikkaja  writes:
> On 2011-02-25 4:58 PM, Tom Lane wrote:
>> Specifically, I'm imagining getting rid of the patch's additions to
>> InitPlan and ExecutePlan that find all the modifying sub-queries and
>> force them to be cycled to completion before the main plan runs.
>> Just run the main plan and let it pull tuples from the CTEs as needed.
>> Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to
>> completion before shutting down the plan.

> This idea has actually been discussed before when we talked about 
> optimizing wCTEs, but IIRC you said that doing this in ExecutorEnd is a 
> bit ugly.

Further experimentation has reminded me of why I didn't want to put such
processing in ExecutorEnd :-(.  There are some nasty interactions with
EXPLAIN:

1. EXPLAIN ANALYZE fails to include the execution cycles associated with
running the ModifyTable nodes to completion.  In the worst case, such as
"WITH t AS (INSERT ...) SELECT 1", it will claim the INSERT subplan is
never executed, even though rows certainly got inserted.  This is
because EXPLAIN extracts all the counts from the execution state tree
before shutting it down with ExecutorEnd.

2. But it gets worse.  Try the same query *without* ANALYZE.  You'll
find the INSERT executes anyway!  That's because EXPLAIN still calls
ExecutorEnd to clean up the execution state tree, and ExecutorEnd
doesn't realize it's not supposed to run any of the plan.

So we really need some refactoring here.  I dislike adding another
fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence,
but there may not be a better way.  The only way I see to fix this
without changing that API is to have ExecutorRun do the cleanup
processing just after the top plan node returns a null tuple, and that
seems a bit ugly as well.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread Robert Haas

On Sat, Feb 26, 2011 at 12:30 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Yay!  I'm excited about this, particularly the possible "pipelining"
>> stuff, where you can do WITH (DELETE .. RETURNING ..) INSERT ...  and
>> have it be like cool and fast and stuff.
>
>> Or at least I hope you can do that.
>
> It's gonna need some work yet.  As things stand, the tuples are indeed
> pipelined through, but the CteScan nodes *also* stash them aside into
> tuplestores, just in case somebody demands a rescan.  Fixing that will
> require revisiting the exec flags (EXEC_FLAG_REWIND etc).  We don't
> currently distinguish "it's unlikely you'll have to rescan" from
> "you're guaranteed not to have to rescan", but a CteScan that's covering
> a ModifyTable has to know the latter to not have to keep hold of copies
> of the RETURNING tuples.
>
> It might be a small enough change to do after alpha starts, but I don't
> have time for it right now.

Well, if nothing else, the potential is there for a future release.
I'm probably not quite as excited about this feature as David Fetter
(and my 100-Watt lightbulb is not quite as bright as the sun at high
noon in midsummer) but I do think it's pretty cool, and I appreciate
you getting it in, even in a somewhat basic form.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread Tom Lane

Robert Haas  writes:
> Yay!  I'm excited about this, particularly the possible "pipelining"
> stuff, where you can do WITH (DELETE .. RETURNING ..) INSERT ...  and
> have it be like cool and fast and stuff.

> Or at least I hope you can do that.

It's gonna need some work yet.  As things stand, the tuples are indeed
pipelined through, but the CteScan nodes *also* stash them aside into
tuplestores, just in case somebody demands a rescan.  Fixing that will
require revisiting the exec flags (EXEC_FLAG_REWIND etc).  We don't
currently distinguish "it's unlikely you'll have to rescan" from
"you're guaranteed not to have to rescan", but a CteScan that's covering
a ModifyTable has to know the latter to not have to keep hold of copies
of the RETURNING tuples.

It might be a small enough change to do after alpha starts, but I don't
have time for it right now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 6:41 PM, Josh Berkus  wrote:
>> One idea Tom and I kicked around previously is to set an assumed
>> caching percentage for each table based on its size relative to
>> effective_cache_size - in other words, assume that the smaller a table
>> is, the more of it will be cached.  Consider a system with 8GB of RAM,
>> and a table which is 64kB.  It is probably unwise to make any plan
>> based on the assumption that that table is less than fully cached.  If
>> it isn't before the query executes, it soon will be.  Going to any
>> amount of work elsewhere in the plan to avoid the work of reading that
>> table in from disk is probably a dumb idea.  Of course, one downside
>> of this approach is that it doesn't know which tables are hot and
>> which tables are cold, but it would probably still be an improvement
>> over the status quo.
>
> Actually, we *do* have some idea which tables are hot.  Or at least, we
> could.   Currently, pg_stats for tables are "timeless"; they just
> accumulate from the last reset, which has always been a problem in
> general for monitoring.  If we could make top-level table and index
> stats time-based, even in some crude way, we would know which tables
> were currently hot.  That would also have the benefit of making server
> performance analysis and autotuning easier.

I think there would be value in giving the DBA an easier way to see
which tables are hot, but I am really leery about the idea of trying
to feed that directly into the query planner.  I think this is one of
those cases where we let people tune it manually for starters, and
then wait for feedback.  Eventually someone will say "oh, I never tune
that by hand any more, ever since I wrote this script which does the
following computation... and I just run it out cron".  And then we
will get out the party hats.  But we will never get the experience we
need to say what that auto-tuning algorithm will be unless we first
provide the knob for someone to fiddle with manually.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 2:26 PM, Alvaro Herrera
 wrote:
> Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011:
>
>> How practical would it be for analyze to keep a record of response times for
>> given sections of a table as it randomly accesses them and generate some
>> kind of a map for expected response times for the pieces of data it is
>> analysing?
>
> I think what you want is random_page_cost that can be tailored per
> tablespace.

We have that.

But it's not the same as tracking *sections of a table*.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 7:00 PM, Tom Lane  wrote:
> Marko Tiikkaja  writes:
>> On 2011-02-24 6:37 PM +0200, Tom Lane wrote:
>>> OK, I will make those adjustments.  Are you going to do more work on the
>>> documentation part of the patch?  I can stick to working on the code
>>> part meanwhile, if you are.
>
>> I am planning on working on the documentation this weekend.
>
> I've gone ahead and applied the code portion of the patch,

Yay!  I'm excited about this, particularly the possible "pipelining"
stuff, where you can do WITH (DELETE .. RETURNING ..) INSERT ...  and
have it be like cool and fast and stuff.

Or at least I hope you can do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python do not delete function arguments

2011-02-25 Thread Robert Haas

On Tue, Feb 15, 2011 at 6:04 PM, Jan Urbański  wrote:
> On 15/02/11 20:39, Peter Eisentraut wrote:
>> On tis, 2011-02-15 at 09:58 +0100, Jan Urbański wrote:
>>> Because the invocation that actually recurses sets up the scene for
>>> failure.
>>
>> That's what we're observing, but I can't figure out why it is.  If you
>> can, could you explain it?
>>
>> It actually makes sense to me that the arguments should be deleted at
>> the end of the call.  The data belongs to that call only, and
>> PLy_procedure_delete() that would otherwise clean it up is only called
>> rarely.
>>
>> Apparently, the recursive call ends up deleting the wrong arguments, but
>> it's not clear to me why that would affect the next top-level call,
>> because that would set up its own arguments again anyway.  In any case,
>> perhaps the right fix is to fix PLy_function_delete_args() to delete the
>> args correctly.
>
> Aaah, ok, I got it (again). Let me write this in full before I forget
> and spend another hour chasing that bug (and boy, bugs that disappear
> because you're doing things in the debugger are so annoying). And
> actually, my patch doesn't fix it fully :| Let me demonstrate:
>
> CREATE FUNCTION rec(n integer) RETURNS integer AS $$
> if n == 0:
>    return
> plpy.notice("before n is %d" % n)
> plpy.execute("select rec(0)")
> plpy.notice("after n is %d" % n)
> $$ LANGUAGE plpythonu;
>
> Without the patch the second plpy.notice raises a NameError. With the
> patch the output is:
>
> NOTICE:  before n is 4
> CONTEXT:  PL/Python function "rec"
> NOTICE:  after n is 0
> CONTEXT:  PL/Python function "rec"
>
> What happens? In PLy_function_handler, PLy_function_build_args is
> called, and proc->globals is set. After that PLy_procedure_call is
> called, which starts executing Python code. The Python code does a call
> into C with plpy.execute, and PLy_function_handler gets called (a
> reentrant call).
>
> Then PLy_function_build_args is called again. It overwrites the "n"
> entry in proc->globals and then PLy_procedure_call gets called, which
> drops us back into Python (on the stack there's now C, Python, C,
> Python). This second invocation exits quickly because n == 0, and we're
> back in C.
>
> Now without my patch, the next thing to happen was deleting the
> arguments, which removed "n" from the proc->globals dict. The rest of C
> code runs and finally plpy.execute returns and we;re back in Python (the
> stack is C, Python).
>
> The second plpy.notice is run, which fetches "n" from the globals, and
> not finding it, raises a NameError. With the patch it simply fetches the
> overwritten value, namely 0.
>
> The KeyError was a red herring - that's how Python reacted when
> evaluating "n in (0, 1)", and if you look in the server log you'll see a
> RuntimeWarning complaining about something internal, that doesn't
> matter. The bottom line is that PLy_procedure_call is not reentrant
> because of proc->globals, and it has to be.
>
> Now when fixing this bug I tries copying the globals dict and restoring
> it, but ran into issues (I think the problem was that the function
> didn't like running with different globals then the one it has been
> compiled with). Not sure what to do with this :( Document it as a caveat
> (with or without my patch) and carry on? That sucks quite badly...

From this discussion I gather that we have a problem here that we
don't exactly know how to fix, so I'm inclined to suggest that we mark
this Returned with Feedback in the CommitFest and instead add it to
the TODO.  Since this is a pre-existing bug and not a new regression,
it should not be something we hold up beta for.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2011-02-25 Thread Tom Lane

Robert Haas  writes:
> Tom/Alvaro, have the two of you hammered out who is going to finish
> this one off?  I *believe* Alvaro told me on IM that he was leaving
> this one for Tom.

Last I heard, the ball was in my court.  I'll try to get it done over
the weekend.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python explicit subtransactions

2011-02-25 Thread Robert Haas

On Wed, Feb 9, 2011 at 5:22 PM, Peter Eisentraut  wrote:
> On tis, 2011-02-08 at 00:32 -0500, Steve Singer wrote:
>> On 11-02-06 11:40 AM, Jan Urbański wrote:
>>
>> > PFA an updated patch with documentation.
>>
>> > Yeah, changed them.
>>
>> Those changes look fine.  The tests now pass.
>>
>> I've attached a new version of the patch that fixes a few typos/wording
>> issues I saw in the documentation.  I also changed the link to the
>> python reference manual section on context managers. I think it is
>> better to link to that versus the original PEP.
>>
>> The documentation could probably still use more word-smithing but that
>> can happen later.  I'm marking this as ready for a committer.
>
> Is it necessarily a good idea that an explicit subtransaction disables
> the implicit sub-subtransactions?  It might be conceivable that you'd
> still want to do some try/catch within explicit subtransactions.

Is this still an open question, or what is the remaining issue that
needs to be addressed with regards to this patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python tracebacks

2011-02-25 Thread Robert Haas

On Thu, Feb 24, 2011 at 9:03 AM, Jan Urbański  wrote:
> On 24/02/11 14:10, Peter Eisentraut wrote:
>> On tor, 2010-12-23 at 14:56 +0100, Jan Urbański wrote:
>>> For errors originating from Python exceptions add the traceback as the
>>> message detail. The patch tries to mimick Python's traceback.py module
>>> behaviour as close as possible, icluding interleaving stack frames
>>> with source code lines in the detail message. Any Python developer
>>> should instantly recognize these kind of error reporting, it looks
>>> almost the same as an error in the interactive Python shell.
>>
>> I think the traceback should go into the CONTEXT part of the error.  The
>> context message that's already there is now redundant with the
>> traceback.
>>
>> You could even call errcontext() multiple times to build up the
>> traceback, but maybe that's not necessary.
>
> Hm, perhaps, I put it in the details, because it sounded like the place
> to put information that is not that important, but still helpful. It's
> kind of natural to think of the traceback as the detail of the error
> message. But if you prefer context, I'm fine with that. You want me to
> update the patch to put the traceback in the context?

I don't see a response to this question from Peter, but I read his
email to indicate that he was hoping you'd rework along these lines.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Fix snapshot taking inconsistencies

2011-02-25 Thread Robert Haas

On Thu, Feb 24, 2011 at 11:02 AM, Tom Lane  wrote:
> Marko Tiikkaja  writes:
>> On 2011-02-24 5:21 PM, Tom Lane wrote:
>>> Oh, did we decide to do it that way?  OK with me, but the submitted docs
>>> are woefully inadequate on the point.  This behavior is going to have to
>>> be explained extremely clearly (and even so, I bet we'll get bug reports
>>> about it :-().
>
>> I'm ready to put more effort into the documentation if the patch is
>> going in, but I really don't want to waste my time just to hear that the
>> patch is not going to be in 9.1.  Does this sound acceptable?
>
> I've found some things I don't like about it, but the only part that
> seems far short of being committable is the documentation.

Tom/Alvaro, have the two of you hammered out who is going to finish
this one off?  I *believe* Alvaro told me on IM that he was leaving
this one for Tom.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL FDW update

2011-02-25 Thread Robert Haas

On Thu, Feb 24, 2011 at 9:06 AM, Robert Haas  wrote:
> On Thu, Feb 24, 2011 at 8:13 AM, Shigeru HANADA
>  wrote:
>>
>> On Wed, 23 Feb 2011 20:30:05 +0900
>> Shigeru HANADA  wrote:
>>
>>>
>>> On Tue, 22 Feb 2011 11:33:25 -0500
>>> Robert Haas  wrote:
>>> > Is anyone actually working on a new version of this patch sufficiently
>>> > rapidly that we can expect a new version in the next day or two?
>>> >
>>> > If not, I think we mark this one Returned with Feedback and revisit it 
>>> > for 9.2.
>>>
>>> I'm working on it.
>>>
>>> Fixes for new FDW API have been done, but there are some problems in
>>> SQL generation codes, such as SELECT clause optimization (omitting
>>> unused column from SELECT clause).  It would take a while, but I'll
>>> post revised version of the patch tomorrow.
>>
>> Attached is a revised version of postgresql_fdw patch.  I started from
>> Heikki's latest patch, and modified some points:
>>
>> 1) use new FDW API
>> 2) use EXTENSION framework
>> 3) SELECT clause optimization (use NULL for unused columns)
>> 4) show remote query in EXPLAIN output
>>
>> WHERE clause pushdown was implemented in Heikki's version, so I didn't
>> touch around it.  Now I'm working on cost estimation and connection
>> management, but they would need some more work.
>
> So this is still work-in-progress?  When do you expect a final version?

Since it sounds like this still needs more work, I'm going to mark it
Returned with Feedback.  I hope we integrate this during the 9.2
cycle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

I wrote:
> The first solution that comes to mind is to pay attention to the
> interdependencies of the CTEs, and perform the cleanup in an appropriate
> order (here, the ModifyTable for y needs to be cycled first).

Doh ... actually, we already *are* ordering the CTEs in dependency
order, so it's a one-liner fix to do the shutdowns in reverse order.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

David Fetter  writes:
> Sorry that was unclear.  Let's imagine there's a DELETE ... RETURNING
> in one WITH, and an UPDATE in another that depends on that one.  Is
> that still allowed?

Yeah it is, although I just noticed that there's a bug in the new
implementation:

with t1 as (insert into x select ... returning *),
 t2 as (insert into y select * from t1 returning *)
select 1;

This should result in the same rows inserted into both x and y, but in
git HEAD it fails to insert anything into y.  The reason is that the
ExecutorEnd scan first processes the ModifyTable node for x, and cycles
it to completion, discarding the results --- but we needed the CteScan
in t2 to see those rows.  There's a related case in the regression
tests, but it works because the outer query does fetch from both WITH
clauses, so there's no need to do anything at ExecutorEnd time.

The first solution that comes to mind is to pay attention to the
interdependencies of the CTEs, and perform the cleanup in an appropriate
order (here, the ModifyTable for y needs to be cycled first).  I'm not
sure if there's a nicer way.  We'll eventually want some interdependency
tracking for CTEs anyway, if we're ever to support mutual recursion,
so it'd not be completely single-purpose code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread David Fetter

On Fri, Feb 25, 2011 at 10:12:02PM -0500, Tom Lane wrote:
> David Fetter  writes:
> > What's the effect, if any, on CTEs that depend on each other
> > explicitly?
> 
> An error.  That would require mutual recursion, which we don't
> support for the SELECT case let alone data-modifying statements.

Sorry that was unclear.  Let's imagine there's a DELETE ... RETURNING
in one WITH, and an UPDATE in another that depends on that one.  Is
that still allowed?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

David Fetter  writes:
> What's the effect, if any, on CTEs that depend on each other
> explicitly?

An error.  That would require mutual recursion, which we don't
support for the SELECT case let alone data-modifying statements.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread David Fetter

On Thu, Feb 24, 2011 at 11:35:30AM -0800, David Wheeler wrote:
> On Feb 24, 2011, at 10:43 AM, Robert Haas wrote:
> 
> >> The best idea I have at the moment is to spell out "data modifying
> >> command" (or "statement") rather than relying on the acronym.
> >> In the code, we could change hasDmlWith to hasModifyingWith, for
> >> example.  The error messages could read like
> >>data-modifying statement in WITH is not allowed in a view
> >> 
> >> Comments?
> > 
> > Great idea.  I had the same complaint when I looked at this patch
> > a year ago, but didn't come up with nearly as good an idea as to
> > what to do about it.
> 
> I like "statement" better than "command," too, but love the acronym
> DMC. As in, "you want to Run [a] DMC." ;-P

Hit it, Run! ;)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread David Fetter

On Thu, Feb 24, 2011 at 11:20:48AM -0500, Tom Lane wrote:
> The wCTE patch refers to the feature it's adding as "DML WITH".  I'm
> still pretty unhappy with that terminology.  In my view of the world,
> "DML" includes SELECT as well as INSERT/UPDATE/DELETE.  The wikipedia
> entry about the term
> http://en.wikipedia.org/wiki/Data_Manipulation_Language
> agrees that that's at least the majority usage, and even our own docs
> seem to use it to include SELECT as often as not.  Since the distinction
> is absolutely critical to talking about this feature sensibly, I don't
> think it's a good plan to use an acronym that is guaranteed to produce
> uncertainty in the reader's mind.
> 
> The best idea I have at the moment is to spell out "data modifying
> command" (or "statement") rather than relying on the acronym.
> In the code, we could change hasDmlWith to hasModifyingWith, for
> example.  The error messages could read like
>   data-modifying statement in WITH is not allowed in a view
> 
> Comments?

+1

If we ever decide add in what I'd originally envisioned, namely DCL
and DDL, the name continues to describe what's going on :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread David Fetter

On Fri, Feb 25, 2011 at 09:58:36AM -0500, Tom Lane wrote:
> I had what seems to me a remarkably good idea, though maybe someone else
> can spot a problem with it.  Given that we've decided to run the
> modifying sub-queries all with the same command counter ID, they are
> logically executing "in parallel".  The current implementation takes no
> advantage of that fact, though: it's based around the idea of running
> the updates strictly sequentially.  I think we should change it so that
> the updates happen physically, not only logically, concurrently.
> Specifically, I'm imagining getting rid of the patch's additions to
> InitPlan and ExecutePlan that find all the modifying sub-queries and
> force them to be cycled to completion before the main plan runs.
> Just run the main plan and let it pull tuples from the CTEs as needed.
> Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to
> completion before shutting down the plan.  (In the event of an error,
> we'd never get to ExecutorEnd, but it doesn't matter since whatever
> updates we did apply are nullified anyhow.)

What's the effect, if any, on CTEs that depend on each other
explicitly?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread Tom Lane

Marko Tiikkaja  writes:
> On 2011-02-26 2:00 AM, Tom Lane wrote:
>> I've gone ahead and applied the code portion of the patch, with
>> modifications as per discussion, and other editorialization.

> Thanks a lot!

> One thing bothers me though: what was the reason for requiring a 
> RETURNING clause for data-modifying statements in WITH?

That test was in your patch, no?  I moved the code to another place
but it's still enforcing the same thing, namely that you can't reference
the output of an INSERT/UPDATE/DELETE that hasn't got RETURNING.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread Marko Tiikkaja


On 2011-02-26 2:00 AM, Tom Lane wrote:

I've gone ahead and applied the code portion of the patch, with
modifications as per discussion, and other editorialization.


Thanks a lot!

One thing bothers me though: what was the reason for requiring a 
RETURNING clause for data-modifying statements in WITH?



I'll wait on you to produce documentation updates before dealing
with the docs, but I figured we might as well get some buildfarm
cycles on it meanwhile.


Thanks, I'll send an improved version tomorrow.


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Daniel Farina

On Fri, Feb 25, 2011 at 5:21 PM, Josh Berkus  wrote:
> On 2/25/11 4:57 PM, Jeff Davis wrote:
>> On Fri, 2011-02-25 at 15:44 -0800, Josh Berkus wrote:
>>> Hmmm, I don't follow this.  The user can only disable syncrep for their
>>> own transactions.   If they don't care about the persistence of their
>>> transaction post-failover, why should the DBA care?
>>
>> I think that's the difference between failover and switchover, right? At
>> least Slony makes such a distinction, as well.
>
> Yeah.  Actually, what would be even simpler and more to the point would
> be a command that says "flush all transactions from Server A to Server
> B, then fail over".

That would be nice; I'm basically abusing syncrep to this purpose. At
the same time, someone may need to be notified of such a switchover
occurring, and in event of failure, it'd be nice to bounce back to the
primary. Tangentially relevent, Virtual IP is not always an option,
such as on Amazon EC2.

But I digress. Such a command is unlikely to make it into 9.1; maybe
we can circle around on that in 9.2.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Daniel Farina

On Fri, Feb 25, 2011 at 4:52 PM, Jeff Davis  wrote:
> On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote:
>> Oh, yes, this reproduces past shutdowns/startups, and there's quite a
>> few txids before I catch up. I'm also comfortable poking around with
>> gdb (I have already recompiled with debugging symbols and
>> optimizations off and was poking around, especially at
>> MemoryContextStats(TopMemoryContext), but was not rewarded.
>
> Where is all of that memory going during recovery? Recovery shouldn't
> use much memory at all, as far as I can tell.
>
> What's even allocating memory at all?

I noticed this is RSS fooling with me. As pages get touched in shared
memory, for some reason RSS was constantly getting increased, along
with SHR at the same time.

Still, the long recovery time was mystifying to me, considering the
lack of unclean shutdowns.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Josh Berkus

On 2/25/11 4:57 PM, Jeff Davis wrote:
> On Fri, 2011-02-25 at 15:44 -0800, Josh Berkus wrote:
>> Hmmm, I don't follow this.  The user can only disable syncrep for their
>> own transactions.   If they don't care about the persistence of their
>> transaction post-failover, why should the DBA care?
> 
> I think that's the difference between failover and switchover, right? At
> least Slony makes such a distinction, as well.

Yeah.  Actually, what would be even simpler and more to the point would
be a command that says "flush all transactions from Server A to Server
B, then fail over".

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Jeff Davis

On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote:
> Oh, yes, this reproduces past shutdowns/startups, and there's quite a
> few txids before I catch up. I'm also comfortable poking around with
> gdb (I have already recompiled with debugging symbols and
> optimizations off and was poking around, especially at
> MemoryContextStats(TopMemoryContext), but was not rewarded.

Where is all of that memory going during recovery? Recovery shouldn't
use much memory at all, as far as I can tell.

What's even allocating memory at all?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Jeff Davis

On Fri, 2011-02-25 at 15:44 -0800, Josh Berkus wrote:
> Hmmm, I don't follow this.  The user can only disable syncrep for their
> own transactions.   If they don't care about the persistence of their
> transaction post-failover, why should the DBA care?

I think that's the difference between failover and switchover, right? At
least Slony makes such a distinction, as well.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Daniel Farina

On Fri, Feb 25, 2011 at 4:36 PM, Josh Berkus  wrote:
> Daniel,
>
>> Ah, okay, I had missed that discussion, I also did not know it got so
>> specific as to address this case (are you sure?) rather than something
>> more general, say quorum or N-safe durability.
>
> The way we address that case is through n-safe durability.

How is this exposed? The simple "count the number of fsyncs()"
approach is not quite good enough (one has no control to make sure one
or more nodes are definitely up-to-date) unless one wants to just make
it go to *all* syncrep standys for a while. That seems like overkill;
so I imagine something else is in the thoughts. I'll search the
archives...

>> The user may have their own level of durability guarantee they want to
>> attain (that's why machine "B" is syncrepped in my example), but when
>> doing the switchover I think an override to enable a smooth handoff
>> (meaning: everything syncrepped) would be best.  What I want to avoid
>> is an ack from "COMMIT" from the primary (machine "A"), and then, post
>> switchover, the data isn't there on machine A-Prime (or "B", provided
>> it was able to follow successfully at all, as in the current case it
>> might get ahead of A-prime in the WAL).
>
> Yeah, when I think about your use case, I can understand why it's an
> issue.  It would be nice to have a superuser setting (or similar) which
> could override user preferances and make all transactions synchrep
> temporarily.  I'm not sure that's going to be reasonable to do for 9.1
> though.

Agreed; I'd be happy to take any syncrep functionality, although it
wouldn't compose well as-is, I wanted to raise this so that we didn't
make any configuration decisions that got in the way of making
composition possible later.  Again, I haven't thought ahead yet,
partially because I thought there may be some existing thoughts in
play to consider.

With that, I will try to give syncrep a more structured review Real
Soon, although the late date of this is leaving me queasy as to the
odds of git-commit.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Josh Berkus

Daniel,

> Ah, okay, I had missed that discussion, I also did not know it got so
> specific as to address this case (are you sure?) rather than something
> more general, say quorum or N-safe durability.

The way we address that case is through n-safe durability.

> The user may have their own level of durability guarantee they want to
> attain (that's why machine "B" is syncrepped in my example), but when
> doing the switchover I think an override to enable a smooth handoff
> (meaning: everything syncrepped) would be best.  What I want to avoid
> is an ack from "COMMIT" from the primary (machine "A"), and then, post
> switchover, the data isn't there on machine A-Prime (or "B", provided
> it was able to follow successfully at all, as in the current case it
> might get ahead of A-prime in the WAL).

Yeah, when I think about your use case, I can understand why it's an
issue.  It would be nice to have a superuser setting (or similar) which
could override user preferances and make all transactions synchrep
temporarily.  I'm not sure that's going to be reasonable to do for 9.1
though.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Daniel Farina

On Fri, Feb 25, 2011 at 3:44 PM, Josh Berkus  wrote:
>
>> Right now, as it stands, the syncrep patch will be happy as soon as
>> the data has been fsynced to either B or A-prime; I don't think we can
>> guarantee at any point that A-prime can become the leader, and feed B.
>
> Yeah, I think that's something we said months ago is going to be a 9.2
> feature, no sooner.

Ah, okay, I had missed that discussion, I also did not know it got so
specific as to address this case (are you sure?) rather than something
more general, say quorum or N-safe durability.

>> 2. The unprivileged user can disable syncrep, in any situation. This
>> flexibility is *great*, but you don't really want people to do it when
>> one is performing the switchover. Rather, in a magical world we'd hope
>> that disabling syncrep would just result in not having to
>> synchronously commit to B (but, in this case, still synchronously
>> commit to A-prime)
>>
>> In other words, to my mind, you can use syncrep as-is to provide
>> 2-safe durability xor a scheduled switchover: as soon as someone wants
>> both, I think they'll have some trouble. I do want both, though.
>
> Hmmm, I don't follow this.  The user can only disable syncrep for their
> own transactions.   If they don't care about the persistence of their
> transaction post-failover, why should the DBA care?

The user may have their own level of durability guarantee they want to
attain (that's why machine "B" is syncrepped in my example), but when
doing the switchover I think an override to enable a smooth handoff
(meaning: everything syncrepped) would be best.  What I want to avoid
is an ack from "COMMIT" from the primary (machine "A"), and then, post
switchover, the data isn't there on machine A-Prime (or "B", provided
it was able to follow successfully at all, as in the current case it
might get ahead of A-prime in the WAL).

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-25 Thread Tom Lane

Marko Tiikkaja  writes:
> On 2011-02-24 6:37 PM +0200, Tom Lane wrote:
>> OK, I will make those adjustments.  Are you going to do more work on the
>> documentation part of the patch?  I can stick to working on the code
>> part meanwhile, if you are.

> I am planning on working on the documentation this weekend.

I've gone ahead and applied the code portion of the patch, with
modifications as per discussion, and other editorialization.
I'll wait on you to produce documentation updates before dealing
with the docs, but I figured we might as well get some buildfarm
cycles on it meanwhile.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Josh Berkus


> Right now, as it stands, the syncrep patch will be happy as soon as
> the data has been fsynced to either B or A-prime; I don't think we can
> guarantee at any point that A-prime can become the leader, and feed B.

Yeah, I think that's something we said months ago is going to be a 9.2
feature, no sooner.

> 2. The unprivileged user can disable syncrep, in any situation. This
> flexibility is *great*, but you don't really want people to do it when
> one is performing the switchover. Rather, in a magical world we'd hope
> that disabling syncrep would just result in not having to
> synchronously commit to B (but, in this case, still synchronously
> commit to A-prime)
> 
> In other words, to my mind, you can use syncrep as-is to provide
> 2-safe durability xor a scheduled switchover: as soon as someone wants
> both, I think they'll have some trouble. I do want both, though.

Hmmm, I don't follow this.  The user can only disable syncrep for their
own transactions.   If they don't care about the persistence of their
transaction post-failover, why should the DBA care?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Josh Berkus


> 4. Even if we could accurately estimate the percentage of the table
> that is cached, what then?  For example, suppose that a user issues a
> query which retrieves 1% of a table, and we know that 1% of that table
> is cached.  How much of the data that the user asked for is cache?

FWIW, for a manual override setting, I was thinking that the % would
convert to a probability.  In that way, it wouldn't be different from
the existing RPC calculation; we're just estimating how *likely* it is
that the data the user wants is cached.

> One idea Tom and I kicked around previously is to set an assumed
> caching percentage for each table based on its size relative to
> effective_cache_size - in other words, assume that the smaller a table
> is, the more of it will be cached.  Consider a system with 8GB of RAM,
> and a table which is 64kB.  It is probably unwise to make any plan
> based on the assumption that that table is less than fully cached.  If
> it isn't before the query executes, it soon will be.  Going to any
> amount of work elsewhere in the plan to avoid the work of reading that
> table in from disk is probably a dumb idea.  Of course, one downside
> of this approach is that it doesn't know which tables are hot and
> which tables are cold, but it would probably still be an improvement
> over the status quo.

Actually, we *do* have some idea which tables are hot.  Or at least, we
could.   Currently, pg_stats for tables are "timeless"; they just
accumulate from the last reset, which has always been a problem in
general for monitoring.  If we could make top-level table and index
stats time-based, even in some crude way, we would know which tables
were currently hot.  That would also have the benefit of making server
performance analysis and autotuning easier.

> But DBAs
> frequently have a very good idea of which stuff is in cache - they can
> make observations over a period of time and then adjust settings and
> then observe some more and adjust some more.

Agreed.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Cédric Villemain

2011/2/25 Robert Haas :
> 2011/2/25 Cédric Villemain :
>>> All that having been said, I think that while Josh is thinking fuzzily
>>> about the mathematics of his proposal, the basic idea is pretty
>>> sensible.  It is not easy - likely not possible - for the system to
>>> have a good idea which things will be in some kind of cache at the
>>> time the query is executed; it could even change mid-query.  The
>>> execution of one part of the query could evict from the cache data
>>> which some other part of the plan assumed would be cached.  But DBAs
>>> frequently have a very good idea of which stuff is in cache - they can
>>> make observations over a period of time and then adjust settings and
>>> then observe some more and adjust some more.
>>
>> I believe we can maintain a small map of area of a relation  which are
>> in the OS buffer cache (shared buffers move more), or at least a
>> percentage of the relation in OS cache. Getting autovacuum daemon
>> being able to update those maps/counters might be enought and easy to
>> do, it is really near what auto-analyze do.  My observation is that
>> the percentage in cache is stable on a production workload after some
>> tens of minutes needed to warm the server.
>
> I don't think we can assume that will be true in all workloads.
> Imagine a server doing batch processing.  People submit large batches
> of work that take, say, an hour to complete.  Not all batches use the
> same set of tables - maybe they even run in different databases.
> After a big batch process finishes crunching numbers in database A,
> very little of database B will be cached.  But it's not necessarily
> right to assume that when we start queries for a new batch in database
> B, although it's more likely to be right for large tables (which will
> take a long time to get cached meaningfully, if they ever do) than
> small ones.  Also, it could lead to strange issues where batches run
> much faster or slower depending on which batch immediately proceeded
> them.  If we're going to do something a lot of times, it'd be better
> to bite the bullet and read it all in rather than going to more work
> elsewhere, but if we're only going to touch it once, then not so much.
>
> You might also have this issue on systems that run OLTP workloads all
> day and then do some batch processing at night to get ready for the
> next business day.  Kevin Grittner wrote previously about those jobs
> needing some different settings in his environment (I'm not
> remembering which settings at the moment).  Suppose that the batch
> process is going to issue a query that can be planned in one of two
> possible ways.  One way involves reading 10% of a relation, and the
> other way involves reading the whole thing.  The first plan takes 200
> s to execute if the relation is not cached, and 180 s if the relevant
> portion is cached.  The second plan takes 300 s to execute if the
> relation is not cached, and 100 s if it is cached.  At the start of
> the batch run, the relation won't be cached, because it's used *only*
> by the overnight job and not by the daily OLTP traffic.  Which way
> should we execute the query?
>
> The answer is that if the batch job only needs to execute that query
> *once*, we should do it the first way.  But if it needs to execute it
> three or more times, the second way is better, but only if we use the
> second plan every time.  If we start out with the first plan, we're
> always better off sticking with it *unless* we know that we're going
> to repeat the query at least twice more after the iteration we're
> currently planning.  To make the right decision, the query planner
> needs a crystal ball.  Or, a little help from the DBA.

Yes, we are talking of improving some part of the model.
Some workloads are dramatic  and need special customization. This is true.

Still there is a path of improvement, and probably it will remain a
path of improvement after the current model is updated.

I am not proposing something to solve all the issues, but way more
interesting IMHO than just letting the dba say : 'this table is in
cache at XX%'.

Btw, pgfincore already do solve the usecase you provide by helping the
DBA to prepare its batch processing, so in some sense I am familiar
with what you describe (take the second plan, pgfincore will preload
in the background, and your query will be done in 100s from the
first).

>
>> What should really help here is to have hooks in the cost functions to
>> test those ideas without the need to patch postgresql-core a lot. Will
>> it be ok to have hooks or will it add to much CPU consumption in a
>> sensible part of the code ?
>
> Depends on where you put them, I guess.  Hooks are pretty cheap, but
> they're also pretty hard to use.

Yes, it will be easier to make an extension, have people testing it
and validate or not the 'new' model

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via p

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 2:33 PM, Daniel Farina  wrote:
> I know I got hit by a backend synchronization (in the sense of locks,
> etc) bugs; do you think it is possible yours (sending SIGSTOP) could
> be the same root cause? I haven't followed all the other bugs cleared
> up by inspection.

I believe that the queue management logic is just totally busted and
needs to be rewritten.  I doubt there is much point in speculating
about details until that's done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect's infomask and infomask2 as smallint

2011-02-25 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mar feb 15 12:42:00 -0300 2011:
> Heikki Linnakangas  writes:

> > pageinspect is just a debugging aid, so I think we should change it from 
> > smallint to int4 in 9.1, and not bother backporting.
> 
> I don't see any reason that the old version of the function couldn't be
> dropped in the upgrade script.  It's not likely anything would be
> depending on it, is it?

Okay, so I changed the C code, and there's working upgrade support that
takes you from the 9.0 version to the 9.1 version.

I tested this by creating a 9.0 database with pageinspect loaded, then
pg_upgrade'd it to 9.1, then ran
create extension pageinspect from unpackaged;

If I run the function before running the CREATE EXTENSION command, it
works but still displays the negative numbers.  After that, it behaves
as expected.

I was a bit surprised that I had to remove the ALTER EXTENSION/ADD
FUNCTION command from the upgrade script, but in hindsight it makes
perfect sense -- those commands are being run inside the
"create_extension" context and so the function being created already
belongs to the extension.  Still, maybe we should make ALTER
EXTENSION/ADD idempotent.

I considered the idea of calling this version 1.1 and shipping a new
pageinspect--1.0--1.1.sql script (which works perfectly, provided you
run ALTER EXTENSION/DROP FUNCTION before dropping the function, then
ALTER/ADD later), but decided that this was overkill.  We can still
change it if people thinks that'd be better, of course, but ...

Nice work on extensions and their upgradability, overall.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] help: collation support on Windows

2011-02-25 Thread Magnus Hagander

On Fri, Feb 25, 2011 at 22:58, Dave Page  wrote:
> On Fri, Feb 25, 2011 at 9:54 PM, Andrew Dunstan  wrote:
>>
>>
>> On 02/25/2011 02:32 PM, Peter Eisentraut wrote:
>>>
>>> According to the online documentation, the APIs are there:
>>> http://msdn.microsoft.com/en-ca/library/a7cwbx4t.aspx
>>>
>>> Now we'd need someone brave try to make it work.  The starting point
>>> would be to define HAVE_LOCALE_T and then make it build.  Microsoft has
>>> all the relevant functions and types with an underscore in front
>>> (_strcoll_l, etc.), so some extra #defining will probably be necessary.
>>>
>>> Also, initdb will need to be patched to get a list of OS locales to
>>> populate the pg_collation catalog with.
>>>
>>> Finally, a regression test customized for Windows, but I can help with
>>> that later.
>>>
>>>
>>>
>>
>>
>> What is the equivalent of "locale -a"?
>
> There isn't a command that I know of, but the API function you
> probably need is EnumSystemLocales.

Yeah. If you want example code, you can probably lift something from
the old pginstaller project on pgfoundry - we use that code to
populate the dropdown box for initdb there.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] help: collation support on Windows

2011-02-25 Thread Dave Page

On Fri, Feb 25, 2011 at 9:54 PM, Andrew Dunstan  wrote:
>
>
> On 02/25/2011 02:32 PM, Peter Eisentraut wrote:
>>
>> According to the online documentation, the APIs are there:
>> http://msdn.microsoft.com/en-ca/library/a7cwbx4t.aspx
>>
>> Now we'd need someone brave try to make it work.  The starting point
>> would be to define HAVE_LOCALE_T and then make it build.  Microsoft has
>> all the relevant functions and types with an underscore in front
>> (_strcoll_l, etc.), so some extra #defining will probably be necessary.
>>
>> Also, initdb will need to be patched to get a list of OS locales to
>> populate the pg_collation catalog with.
>>
>> Finally, a regression test customized for Windows, but I can help with
>> that later.
>>
>>
>>
>
>
> What is the equivalent of "locale -a"?

There isn't a command that I know of, but the API function you
probably need is EnumSystemLocales.


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] help: collation support on Windows

2011-02-25 Thread Andrew Dunstan




On 02/25/2011 02:32 PM, Peter Eisentraut wrote:

According to the online documentation, the APIs are there:
http://msdn.microsoft.com/en-ca/library/a7cwbx4t.aspx

Now we'd need someone brave try to make it work.  The starting point
would be to define HAVE_LOCALE_T and then make it build.  Microsoft has
all the relevant functions and types with an underscore in front
(_strcoll_l, etc.), so some extra #defining will probably be necessary.

Also, initdb will need to be patched to get a list of OS locales to
populate the pg_collation catalog with.

Finally, a regression test customized for Windows, but I can help with
that later.






What is the equivalent of "locale -a"?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] help: collation support on Windows

2011-02-25 Thread Peter Eisentraut

According to the online documentation, the APIs are there:
http://msdn.microsoft.com/en-ca/library/a7cwbx4t.aspx

Now we'd need someone brave try to make it work.  The starting point
would be to define HAVE_LOCALE_T and then make it build.  Microsoft has
all the relevant functions and types with an underscore in front
(_strcoll_l, etc.), so some extra #defining will probably be necessary.

Also, initdb will need to be patched to get a list of OS locales to
populate the pg_collation catalog with.

Finally, a regression test customized for Windows, but I can help with
that later.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Jaime Casanova

On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga  wrote:
>>
> I also did some initial testing on this patch and got the queue related
> errors with > 1 clients. With the code change from Jaime above I still got a
> lot of 'not on queue warnings'.
>
> I tried to understand how the queue was supposed to work - resulting in the
> changes below that also incorporates a suggestion from Fujii upthread, to
> early exit when myproc was found.
>

yes, looking at the code, the warning and your patch... it seems yours
is the right solution...
I'm compiling right now to test again and see the effects, Robert
maybe you can test your failure case again? i'm really sure it's
related to this...

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Daniel Farina

On Fri, Feb 25, 2011 at 4:43 AM, Robert Haas  wrote:
> On Fri, Feb 25, 2011 at 3:14 AM, Daniel Farina  wrote:
>> On Wed, Feb 23, 2011 at 11:49 AM, Greg Smith  wrote:
>>> Robert Haas wrote:
>
> 2. Synchronous replication.  Splitting up this patch has allowed some
>>> On top of 4 listed reviewers I know Dan Farina is poking at the last update,
>>> so we may see one more larger report on top of what's already shown up.  And
>>> Jaime keeps kicking the tires too.  What Simon was hoping is that a week of
>>> others looking at this would produce enough feedback that it might be
>>> possible to sweep the remaining issues up soon after he's back.  It looks to
>>> me like that's about when everything else that's still open will probably
>>> settle too.
>>
>> Besides some of the fixable issues, I am going to have to echo
>> Robert's sentiments about a few kinks that go beyond mechanism in the
>> syncrep patch: in particular, it will *almost* solve the use case I
>> was hoping to solve: a way to cleanly perform planned switchovers
>> between machines with minimal downtime and no lost data. But there are
>> a couple of holes I have thought of so far:
>
> Well, just because the patch doesn't solve every use case isn't a
> reason not to go forward with it - we can always add more options
> later - but I have to admit that I'm kind of alarmed about the number
> of bugs reported so far.

True: the relevance of any use case to acceptance is up to some
debate. I haven't thought about how to remedy this, just thinking
aloud about a problem I would have as-is, and is important to me.  It
is true that later accretion of options can occur, but sometimes the
initial choice of semantics can make growing those easier or harder.
I haven't yet thought ahead as to how the current scheme would impact
that.

I know I got hit by a backend synchronization (in the sense of locks,
etc) bugs; do you think it is possible yours (sending SIGSTOP) could
be the same root cause? I haven't followed all the other bugs cleared
up by inspection.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Alvaro Herrera

Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011:

> How practical would it be for analyze to keep a record of response times for
> given sections of a table as it randomly accesses them and generate some
> kind of a map for expected response times for the pieces of data it is
> analysing?

I think what you want is random_page_cost that can be tailored per
tablespace.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Daniel Farina

On Fri, Feb 25, 2011 at 5:25 AM, marcin mank  wrote:
> On Fri, Feb 25, 2011 at 9:14 AM, Daniel Farina  wrote:
>>
>> Right now, as it stands, the syncrep patch will be happy as soon as
>> the data has been fsynced to either B or A-prime; I don't think we can
>> guarantee at any point that A-prime can become the leader, and feed B.
>>
>
> - start A` up, replicating from A
> - shutdown B (now A nad A` are synchronous)
> now real quick:
> - shut down A
> - shut down A`
> -change configuration
> -start up A`
> -start up B
>
> Doesn`t this work?

This dance does work, but it would be very nice to not have to take
the standby ('B' in my case) offline.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

Greg Stark  writes:
> On Fri, Feb 25, 2011 at 2:58 PM, Tom Lane  wrote:
>> However, the real reason for doing it isn't any of those, but rather
>> to establish the principle that the executions of the modifying
>> sub-queries are interleaved not sequential.

> Does the interleaved execution have sane semantics?

Depends on what you call sane.  With the decision to not increment
command counter, it's already the case that people shouldn't have
two subqueries try to modify the same row.

> With a query like:

> WITH
>   a as update x set x.i=x.i+1 returning x.i,
>   b as update x set x.i=x.i+1 returning x.i
> select * from a natural join b;

> Is there any way to tell what it will return or what state it will
> leave the table in?

My reaction to that is "you shouldn't do that, and you definitely
shouldn't complain if it's not predictable whether a or b will
modify a given row".  This is exactly the sort of assumption I don't
want people building into their queries, because we will be locked
into purely sequential execution if we promise that the results will
be consistent.

There is already precedent for that position.  You can easily construct
queries using UPDATE ... FROM wherein the same target row joins to more
than one row in the FROM table, and then it's unpredictable which
joining row will be used to update that target row.  Our position has
always been "don't do that", not that we'd lobotomize the planner and
executor to ensure predictability.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas

2011/2/25 Cédric Villemain :
>> All that having been said, I think that while Josh is thinking fuzzily
>> about the mathematics of his proposal, the basic idea is pretty
>> sensible.  It is not easy - likely not possible - for the system to
>> have a good idea which things will be in some kind of cache at the
>> time the query is executed; it could even change mid-query.  The
>> execution of one part of the query could evict from the cache data
>> which some other part of the plan assumed would be cached.  But DBAs
>> frequently have a very good idea of which stuff is in cache - they can
>> make observations over a period of time and then adjust settings and
>> then observe some more and adjust some more.
>
> I believe we can maintain a small map of area of a relation  which are
> in the OS buffer cache (shared buffers move more), or at least a
> percentage of the relation in OS cache. Getting autovacuum daemon
> being able to update those maps/counters might be enought and easy to
> do, it is really near what auto-analyze do.  My observation is that
> the percentage in cache is stable on a production workload after some
> tens of minutes needed to warm the server.

I don't think we can assume that will be true in all workloads.
Imagine a server doing batch processing.  People submit large batches
of work that take, say, an hour to complete.  Not all batches use the
same set of tables - maybe they even run in different databases.
After a big batch process finishes crunching numbers in database A,
very little of database B will be cached.  But it's not necessarily
right to assume that when we start queries for a new batch in database
B, although it's more likely to be right for large tables (which will
take a long time to get cached meaningfully, if they ever do) than
small ones.  Also, it could lead to strange issues where batches run
much faster or slower depending on which batch immediately proceeded
them.  If we're going to do something a lot of times, it'd be better
to bite the bullet and read it all in rather than going to more work
elsewhere, but if we're only going to touch it once, then not so much.

You might also have this issue on systems that run OLTP workloads all
day and then do some batch processing at night to get ready for the
next business day.  Kevin Grittner wrote previously about those jobs
needing some different settings in his environment (I'm not
remembering which settings at the moment).  Suppose that the batch
process is going to issue a query that can be planned in one of two
possible ways.  One way involves reading 10% of a relation, and the
other way involves reading the whole thing.  The first plan takes 200
s to execute if the relation is not cached, and 180 s if the relevant
portion is cached.  The second plan takes 300 s to execute if the
relation is not cached, and 100 s if it is cached.  At the start of
the batch run, the relation won't be cached, because it's used *only*
by the overnight job and not by the daily OLTP traffic.  Which way
should we execute the query?

The answer is that if the batch job only needs to execute that query
*once*, we should do it the first way.  But if it needs to execute it
three or more times, the second way is better, but only if we use the
second plan every time.  If we start out with the first plan, we're
always better off sticking with it *unless* we know that we're going
to repeat the query at least twice more after the iteration we're
currently planning.  To make the right decision, the query planner
needs a crystal ball.  Or, a little help from the DBA.

> What should really help here is to have hooks in the cost functions to
> test those ideas without the need to patch postgresql-core a lot. Will
> it be ok to have hooks or will it add to much CPU consumption in a
> sensible part of the code ?

Depends on where you put them, I guess.  Hooks are pretty cheap, but
they're also pretty hard to use.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Merlin Moncure

On Fri, Feb 25, 2011 at 11:31 AM, Greg Stark  wrote:
> On Fri, Feb 25, 2011 at 2:58 PM, Tom Lane  wrote:
>> However, the real reason for doing it isn't any of those, but rather
>> to establish the principle that the executions of the modifying
>> sub-queries are interleaved not sequential.  We're never going to be
>> able to do any significant optimization of such queries if we have to
>> preserve the behavior that the sub-queries execute sequentially.
>> And I think it's inevitable that users will manage to build such an
>> assumption into their queries if the first release with the feature
>> behaves that way.
>
> Does the interleaved execution have sane semantics?
>
> With a query like:
>
> WITH
>  a as update x set x.i=x.i+1 returning x.i,
>  b as update x set x.i=x.i+1 returning x.i
> select * from a natural join b;
>
> Is there any way to tell what it will return or what state it will
> leave the table in?

WITH
  a as update x set x.i=x.i+1 returning x.i,
  b as update x set x.i=x.i+1 where x.i = 1 returning x.i
 select * from a natural join b;

or the above if x is.i is 1 for all x on query start?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Greg Stark

On Fri, Feb 25, 2011 at 2:58 PM, Tom Lane  wrote:
> However, the real reason for doing it isn't any of those, but rather
> to establish the principle that the executions of the modifying
> sub-queries are interleaved not sequential.  We're never going to be
> able to do any significant optimization of such queries if we have to
> preserve the behavior that the sub-queries execute sequentially.
> And I think it's inevitable that users will manage to build such an
> assumption into their queries if the first release with the feature
> behaves that way.

Does the interleaved execution have sane semantics?

With a query like:

WITH
  a as update x set x.i=x.i+1 returning x.i,
  b as update x set x.i=x.i+1 returning x.i
select * from a natural join b;

Is there any way to tell what it will return or what state it will
leave the table in?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2011-02-25 Thread Tom Lane

Marko Tiikkaja  writes:
> On 2011-02-25 6:12 PM, Tom Lane wrote:
>> The current implementation with everything in one plantree
>> really ought to look just like a SELECT so far as the portal code
>> is concerned.

> The problem was that the old code was using PORTAL_MULTI_QUERY whenever 
> a wCTE was present.  Are you saying that you are using 
> PORTAL_ONE_SELECT?  Doesn't that have problems with triggers, for example?

Hmmm ... good question.  I notice the lack of any regression test cases
involving triggers.  Will check this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Rod Taylor

> 4. Even if we could accurately estimate the percentage of the table
> that is cached, what then?  For example, suppose that a user issues a
> query which retrieves 1% of a table, and we know that 1% of that table
> is cached.  How much of the data that the user asked for is cache?
> Hard to say, right?  It could be none of it or all of it.  The second
> scenario is easy to imagine - just suppose the query's been executed
> twice.  The first scenario isn't hard to imagine either.
>
>
I have a set of slow disks which can impact performance nearly as much as in
cached in memory versus the fast disks.

How practical would it be for analyze to keep a record of response times for
given sections of a table as it randomly accesses them and generate some
kind of a map for expected response times for the pieces of data it is
analysing?

It may well discover, on it's own, that recent data (1 month old or less)
has a random read response time of N, older data (1 year old) in a different
section of the relation tends to have a response time of 1000N, and really
old data (5 year old) tends to have a response time of 3000N.

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Cédric Villemain

2011/2/25 Robert Haas :
> On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian  wrote:
>> Josh Berkus wrote:
>>> On 2/23/11 7:10 AM, Robert Haas wrote:
>>> > IME, most bad query plans are caused by either incorrect
>>> > estimates of selectivity, or wrongheaded notions about what's likely
>>> > to be cached.  If we could find a way, automated or manual, of
>>> > providing the planner some better information about the facts of life
>>> > in those areas, I think we'd be way better off.  I'm open to ideas
>>> > about what the best way to do that is.
>>>
>>> As previously discussed, I'm fine with approaches which involve
>>> modifying database objects.  These are auditable and centrally managable
>>> and aren't devastating to upgrades.
>>>
>>> So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way
>>> that decorating queries would not.
>>>
>>> Similiarly, I would love to be able to set "cache %" on a per-relation
>>> basis, and override the whole dubious calculation involving
>>> random_page_cost for scans of that table.
>>
>> We should just fine a way of checking what percentage of a table is
>> already in the shared buffers.  That doesn't help us with the kernel
>> cache, but it would be a good start and something that doesn't require
>> user tuning.
>
> You're reinventing a wheel that's already been discarded multiple
> times.  There are at least four separate problems:
>
> 1. The percentage of the table which is cached in shared_buffers at
> plan time need not match the percentage that is cached at execution
> time.  A delay of even a few seconds between planning and execution
> could make the numbers totally different, and plans can be cached for
> much longer than that.
>
> 2. Because shared_buffers can turn over quite quickly, planning the
> statement multiple times in relatively quick succession could give
> different results each time.  Previous discussions on this topic have
> concluded that DBAs hate plan instability, and hate GEQO because it
> causes plan instability, and this would inject plan instabiilty into
> the main planner.
>
> 3. The percentage of the table which is cached in shared_buffers is
> not necessarily representative of the percentage which is cached in
> general.  On a large machine, shared_buffers may be less than 10% of
> the total cache.  It would be unwise to make guesses about what is and
> is not cached based on a small percentage of the cache.
>
> 4. Even if we could accurately estimate the percentage of the table
> that is cached, what then?  For example, suppose that a user issues a
> query which retrieves 1% of a table, and we know that 1% of that table
> is cached.  How much of the data that the user asked for is cache?
> Hard to say, right?  It could be none of it or all of it.  The second
> scenario is easy to imagine - just suppose the query's been executed
> twice.  The first scenario isn't hard to imagine either.
>
> One idea Tom and I kicked around previously is to set an assumed
> caching percentage for each table based on its size relative to
> effective_cache_size - in other words, assume that the smaller a table
> is, the more of it will be cached.  Consider a system with 8GB of RAM,
> and a table which is 64kB.  It is probably unwise to make any plan
> based on the assumption that that table is less than fully cached.  If
> it isn't before the query executes, it soon will be.  Going to any
> amount of work elsewhere in the plan to avoid the work of reading that
> table in from disk is probably a dumb idea.  Of course, one downside
> of this approach is that it doesn't know which tables are hot and
> which tables are cold, but it would probably still be an improvement
> over the status quo.

Yes, good idea.

>
> All that having been said, I think that while Josh is thinking fuzzily
> about the mathematics of his proposal, the basic idea is pretty
> sensible.  It is not easy - likely not possible - for the system to
> have a good idea which things will be in some kind of cache at the
> time the query is executed; it could even change mid-query.  The
> execution of one part of the query could evict from the cache data
> which some other part of the plan assumed would be cached.  But DBAs
> frequently have a very good idea of which stuff is in cache - they can
> make observations over a period of time and then adjust settings and
> then observe some more and adjust some more.

I believe we can maintain a small map of area of a relation  which are
in the OS buffer cache (shared buffers move more), or at least a
percentage of the relation in OS cache. Getting autovacuum daemon
being able to update those maps/counters might be enought and easy to
do, it is really near what auto-analyze do. My observation is that
the percentage in cache is stable on a production workload after some
tens of minutes needed to warm the server.

What should really help here is to have hooks in the cost functions to
test those ideas without the need to patch postgresql-core a lot

Re: [HACKERS] sync rep design architecture (was "disposition of remaining patches")

2011-02-25 Thread Greg Smith


Daniel Farina wrote:

Server A syncreps to Server B

Now I want to provision server A-prime, which will eventually take the
place of A.

Server A syncreps to Server B
Server A syncreps to Server A-prime

Right now, as it stands, the syncrep patch will be happy as soon as
the data has been fsynced to either B or A-prime; I don't think we can
guarantee at any point that A-prime can become the leader, and feed B.
  


One of the very fundamental breaks between how this patch implements 
sync rep and what some people might expect is this concern.  Having such 
tight control over the exact order of failover isn't quite here yet, so 
sometimes people will need to be creative to work within the 
restrictions of what is available.  The path for this case is probably:


1) Wait until A' is caught up
2) Switchover to B as the right choice to be the new master, with A' as 
its standby and A going off-line at the same time.
3) Switchover the master role from B to A'.  Bring up B as its standby. 


There are other possible transition plans available too.

I appreciate that you would like to do this as an atomic operation, 
rather than handling it as two steps--one of which puts you in a middle 
point where B, a possibly inferior standby, is operating at the master.  
There are a dozen other complicated "my use case says I want  and it 
must be done as " requests for Sync Rep floating around here, too.  
They're all getting ignored in favor of something smaller that can get 
built today. 

The first question I'd ask is whether you could you settle for this more 
cumbersome than you'd prefer switchover plan for now.  The second is 
whether implementing what this feature currently does would get in the 
way of coding of what you really want eventually. 

I didn't get the Streaming Rep + Hot Standby features I wanted in 9.0 
either.  But committing what was reasonable to include in that version 
let me march forward with very useful new code, doing another year of 
development on my own projects and getting some new things get fixed in 
core.  And so far it looks like 9.1 will sort out all of the kinks I was 
unhappy about.  The same sort of thing will need to happen to get Sync 
Rep committed and then appropriate for more use cases.  There isn't any 
margin left for discussions of scope creep left here; really it's "is 
this subset useful for some situations and stable enough to commit" now.



2. The unprivileged user can disable syncrep, in any situation. This
flexibility is *great*, but you don't really want people to do it when
one is performing the switchover.


For the moment you may have to live with a situation where user 
connections must be blocked during the brief moment of switchover to 
eliminate this issue.  That's what I end up doing with 9.0 production 
systems to get a really clean switchover, there's a second of hiccup 
even in the best case.  I'm not sure yet of the best way yet to build a 
UI to make that more transparent in the sync rep case.  It's sure not a 
problem that's going to get solved in this release though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2011-02-25 Thread Marko Tiikkaja


On 2011-02-25 6:12 PM, Tom Lane wrote:

Marko Tiikkaja  writes:

Without hacking it broke when PQdescribePrepared was called on a
prepared query like:



WITH t AS (DELETE FROM foo)
SELECT 1;



Not sure if that's an actual problem, but it seemed like something worht
fixing.


I can't replicate such a problem here --- do you have a concrete test
case?  ISTM the issue would only have been a problem back when you
were trying to generate multiple PlannedStmts from a query like the
above.


I don't have one right now (I lost the one I had because of a hardware 
failure in a virtual machine), but I can write you one if you want to. 
But see below.



The current implementation with everything in one plantree
really ought to look just like a SELECT so far as the portal code
is concerned.


The problem was that the old code was using PORTAL_MULTI_QUERY whenever 
a wCTE was present.  Are you saying that you are using 
PORTAL_ONE_SELECT?  Doesn't that have problems with triggers, for example?



Also, why are we forbidding wCTEs in cursors?  Given the current
definitions, that case seems to work fine too: the wCTEs will be
executed as soon as you fetch something from the cursor.  Are you
just worried about not allowing a case that might be hard to support
later?



Honestly, I have no idea.  It might be a leftover from the previous
design.  If it looks like it's easy to support, then go for it.


Right now I'm thinking that it is best to continue to forbid it.
If we go over to the less-sequential implementation that I'm advocating
in another thread, the timing of the updates would become a lot less
predictable than I say above.  If we refuse it for now, we can always
remove the restriction later, but the other way is more painful.


Fair enough.


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2011-02-25 Thread Tom Lane

Marko Tiikkaja  writes:
> On 2011-02-25 1:36 AM, Tom Lane wrote:
>> Why is it necessary to hack the portal logic at all?  The patch seems to
>> work for me without that.  (I've fixed quite a few bugs though, so maybe
>> what this is really doing is masking a problem elsewhere.)

> Without hacking it broke when PQdescribePrepared was called on a 
> prepared query like:

> WITH t AS (DELETE FROM foo)
> SELECT 1;

> Not sure if that's an actual problem, but it seemed like something worht 
> fixing.

I can't replicate such a problem here --- do you have a concrete test
case?  ISTM the issue would only have been a problem back when you
were trying to generate multiple PlannedStmts from a query like the
above.  The current implementation with everything in one plantree
really ought to look just like a SELECT so far as the portal code
is concerned.

>> Also, why are we forbidding wCTEs in cursors?  Given the current
>> definitions, that case seems to work fine too: the wCTEs will be
>> executed as soon as you fetch something from the cursor.  Are you
>> just worried about not allowing a case that might be hard to support
>> later?

> Honestly, I have no idea.  It might be a leftover from the previous 
> design.  If it looks like it's easy to support, then go for it.

Right now I'm thinking that it is best to continue to forbid it.
If we go over to the less-sequential implementation that I'm advocating
in another thread, the timing of the updates would become a lot less
predictable than I say above.  If we refuse it for now, we can always
remove the restriction later, but the other way is more painful.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Marko Tiikkaja


On 2011-02-25 4:58 PM, Tom Lane wrote:

Specifically, I'm imagining getting rid of the patch's additions to
InitPlan and ExecutePlan that find all the modifying sub-queries and
force them to be cycled to completion before the main plan runs.
Just run the main plan and let it pull tuples from the CTEs as needed.
Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to
completion before shutting down the plan.  (In the event of an error,
we'd never get to ExecutorEnd, but it doesn't matter since whatever
updates we did apply are nullified anyhow.)


This idea has actually been discussed before when we talked about 
optimizing wCTEs, but IIRC you said that doing this in ExecutorEnd is a 
bit ugly.  But if you can write this idea down in a way that makes you 
happy with the implementation, I think it's a huge benefit and we should 
definitely do it.



This has a number of immediate and future implementation benefits:



3. The code could be significantly simpler.  Instead of that rather
complex and fragile logic in InitPlan to try to locate all the
ModifyTable nodes and their CTEScan parents, we could just have
ModifyTable nodes add themselves to a list in the EState during
ExecInitNode.  Then ExecutorEnd just traverses that list.


Sounds good to me.


However, the real reason for doing it isn't any of those, but rather
to establish the principle that the executions of the modifying
sub-queries are interleaved not sequential.  We're never going to be
able to do any significant optimization of such queries if we have to
preserve the behavior that the sub-queries execute sequentially.
And I think it's inevitable that users will manage to build such an
assumption into their queries if the first release with the feature
behaves that way.


Yeah, you might be right.


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

"Kevin Grittner"  writes:
> Tom Lane  wrote:
>> Given that we've decided to run the modifying sub-queries all with
>> the same command counter ID, they are logically executing "in
>> parallel".
 
>> Just run the main plan and let it pull tuples from the CTEs as
>> needed.
 
> On the face of it, that sounds like it has another benefit you
> didn't mention -- it sounds like it's much more conducive to
> allowing parallel processing, if (when?) we eventually move in that
> direction.  It might even be a good case for an initial, limited
> implementation.

Yeah.  Most of the executor is in principle parallelizable at the
plan-node level (ignoring the obvious and severe implementation
problems with parallelizing *anything* in the backend).  It's not
good for wCTE to be creating a user-visible assumption that certain
things will happen in a predefined order.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Yeb Havinga


On 2011-02-22 20:43, Jaime Casanova wrote:


you can make this happen more easily, i just run "pgbench -n -c10 -j10
test" and qot that warning and sometimes a segmentation fault and
sometimes a failed assertion

and the problematic code starts at
src/backend/replication/syncrep.c:277, here my suggestions on that
code.
still i get a failed assertion because of the second Assert (i think
we should just remove that one)

*** SyncRepRemoveFromQueue(void)
*** 288,299 

 if (proc->lwWaitLink == NULL)
 elog(WARNING, "could not locate
ourselves on wait queue");
!   proc = proc->lwWaitLink;
 }

 if (proc->lwWaitLink == NULL)   /* At tail */
 {
!   Assert(proc == MyProc);
 /* Remove ourselves from tail of queue */
 Assert(queue->tail == MyProc);
 queue->tail = proc;
--- 288,300 

 if (proc->lwWaitLink == NULL)
 elog(WARNING, "could not locate
ourselves on wait queue");
!   else
!   proc = proc->lwWaitLink;
 }

 if (proc->lwWaitLink == NULL)   /* At tail */
 {
!   Assert(proc != MyProc);
 /* Remove ourselves from tail of queue */
 Assert(queue->tail == MyProc);
 queue->tail = proc;

I also did some initial testing on this patch and got the queue related 
errors with > 1 clients. With the code change from Jaime above I still 
got a lot of 'not on queue warnings'.


I tried to understand how the queue was supposed to work - resulting in 
the changes below that also incorporates a suggestion from Fujii 
upthread, to early exit when myproc was found.


With the changes below all seems to work without warnings. I now see 
that the note about the list invariant is too short, better was: "if 
queue length = 1 then head = tail"


--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void)
}
else
{
+   bool found = false;
+
while (proc->lwWaitLink != NULL)
{
/* Are we the next proc in our traversal of the 
queue? */

@@ -284,17 +286,19 @@ SyncRepRemoveFromQueue(void)
 * No need to touch head or tail.
 */
proc->lwWaitLink = MyProc->lwWaitLink;
+   found = true;
+   break;
}

-   if (proc->lwWaitLink == NULL)
-   elog(WARNING, "could not locate 
ourselves on wait queue");

proc = proc->lwWaitLink;
}
+   if (!found)
+   elog(WARNING, "could not locate ourselves on 
wait queue");


-   if (proc->lwWaitLink == NULL)   /* At tail */
+   /* If MyProc was removed from the tail, maintain list 
invariant head==tail */

+   if (proc->lwWaitLink == NULL)
{
-   Assert(proc == MyProc);
-   /* Remove ourselves from tail of queue */
+   Assert(proc != MyProc); /* impossible since that 
is the head=MyProc branch above */

Assert(queue->tail == MyProc);
queue->tail = proc;
proc->lwWaitLink = NULL;

I needed to add this to make the documentation compile

--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF;
 You should also consider setting hot_standby_feedback
 as an alternative to using this parameter.

+ 
+ 
+ 



regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Kevin Grittner

Tom Lane  wrote:

> Given that we've decided to run the modifying sub-queries all with
> the same command counter ID, they are logically executing "in
> parallel".

> Just run the main plan and let it pull tuples from the CTEs as
> needed.

On the face of it, that sounds like it has another benefit you
didn't mention -- it sounds like it's much more conducive to
allowing parallel processing, if (when?) we eventually move in that
direction.  It might even be a good case for an initial, limited
implementation.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 9:58 AM, Tom Lane  wrote:
> I had what seems to me a remarkably good idea, though maybe someone else
> can spot a problem with it.  Given that we've decided to run the
> modifying sub-queries all with the same command counter ID, they are
> logically executing "in parallel".  The current implementation takes no
> advantage of that fact, though: it's based around the idea of running
> the updates strictly sequentially.  I think we should change it so that
> the updates happen physically, not only logically, concurrently.
> Specifically, I'm imagining getting rid of the patch's additions to
> InitPlan and ExecutePlan that find all the modifying sub-queries and
> force them to be cycled to completion before the main plan runs.
> Just run the main plan and let it pull tuples from the CTEs as needed.
> Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to
> completion before shutting down the plan.  (In the event of an error,
> we'd never get to ExecutorEnd, but it doesn't matter since whatever
> updates we did apply are nullified anyhow.)
>
> This has a number of immediate and future implementation benefits:
>
> 1. RETURNING tuples that aren't actually needed by the main plan
> don't need to be buffered anywhere.  (ExecutorEnd would just pull
> directly from the ModifyTable nodes, ignoring their parent CTE
> nodes, in all cases.)
>
> 2. In principle, in many common cases the RETURNING tuples wouldn't have
> to be buffered at all, but could be consumed on-the-fly.  I think that
> right now the CTEScan nodes might still buffer the tuples so they can
> regurgitate them in case of being rescanned, but it's not hard to see
> how that could be improved later if it doesn't work immediately.
>
> 3. The code could be significantly simpler.  Instead of that rather
> complex and fragile logic in InitPlan to try to locate all the
> ModifyTable nodes and their CTEScan parents, we could just have
> ModifyTable nodes add themselves to a list in the EState during
> ExecInitNode.  Then ExecutorEnd just traverses that list.
>
> However, the real reason for doing it isn't any of those, but rather
> to establish the principle that the executions of the modifying
> sub-queries are interleaved not sequential.  We're never going to be
> able to do any significant optimization of such queries if we have to
> preserve the behavior that the sub-queries execute sequentially.
> And I think it's inevitable that users will manage to build such an
> assumption into their queries if the first release with the feature
> behaves that way.
>
> Comments?

I completely agree.  Actually, I thought we had already agreed on the
design you just proposed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-25 Thread Tom Lane

I had what seems to me a remarkably good idea, though maybe someone else
can spot a problem with it.  Given that we've decided to run the
modifying sub-queries all with the same command counter ID, they are
logically executing "in parallel".  The current implementation takes no
advantage of that fact, though: it's based around the idea of running
the updates strictly sequentially.  I think we should change it so that
the updates happen physically, not only logically, concurrently.
Specifically, I'm imagining getting rid of the patch's additions to
InitPlan and ExecutePlan that find all the modifying sub-queries and
force them to be cycled to completion before the main plan runs.
Just run the main plan and let it pull tuples from the CTEs as needed.
Then, in ExecutorEnd, cycle any unfinished ModifyTable nodes to
completion before shutting down the plan.  (In the event of an error,
we'd never get to ExecutorEnd, but it doesn't matter since whatever
updates we did apply are nullified anyhow.)

This has a number of immediate and future implementation benefits:

1. RETURNING tuples that aren't actually needed by the main plan
don't need to be buffered anywhere.  (ExecutorEnd would just pull
directly from the ModifyTable nodes, ignoring their parent CTE
nodes, in all cases.)

2. In principle, in many common cases the RETURNING tuples wouldn't have
to be buffered at all, but could be consumed on-the-fly.  I think that
right now the CTEScan nodes might still buffer the tuples so they can
regurgitate them in case of being rescanned, but it's not hard to see
how that could be improved later if it doesn't work immediately.

3. The code could be significantly simpler.  Instead of that rather
complex and fragile logic in InitPlan to try to locate all the
ModifyTable nodes and their CTEScan parents, we could just have
ModifyTable nodes add themselves to a list in the EState during
ExecInitNode.  Then ExecutorEnd just traverses that list.

However, the real reason for doing it isn't any of those, but rather
to establish the principle that the executions of the modifying
sub-queries are interleaved not sequential.  We're never going to be
able to do any significant optimization of such queries if we have to
preserve the behavior that the sub-queries execute sequentially.
And I think it's inevitable that users will manage to build such an
assumption into their queries if the first release with the feature
behaves that way.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread marcin mank

On Fri, Feb 25, 2011 at 9:14 AM, Daniel Farina  wrote:
>
> Right now, as it stands, the syncrep patch will be happy as soon as
> the data has been fsynced to either B or A-prime; I don't think we can
> guarantee at any point that A-prime can become the leader, and feed B.
>

- start A` up, replicating from A
- shutdown B (now A nad A` are synchronous)
now real quick:
- shut down A
- shut down A`
-change configuration
-start up A`
-start up B

Doesn`t this work?

Greetings
Marcin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian  wrote:
> Josh Berkus wrote:
>> On 2/23/11 7:10 AM, Robert Haas wrote:
>> > IME, most bad query plans are caused by either incorrect
>> > estimates of selectivity, or wrongheaded notions about what's likely
>> > to be cached.  If we could find a way, automated or manual, of
>> > providing the planner some better information about the facts of life
>> > in those areas, I think we'd be way better off.  I'm open to ideas
>> > about what the best way to do that is.
>>
>> As previously discussed, I'm fine with approaches which involve
>> modifying database objects.  These are auditable and centrally managable
>> and aren't devastating to upgrades.
>>
>> So thinks like the proposed "CREATE SELECTIVITY" would be OK in a way
>> that decorating queries would not.
>>
>> Similiarly, I would love to be able to set "cache %" on a per-relation
>> basis, and override the whole dubious calculation involving
>> random_page_cost for scans of that table.
>
> We should just fine a way of checking what percentage of a table is
> already in the shared buffers.  That doesn't help us with the kernel
> cache, but it would be a good start and something that doesn't require
> user tuning.

You're reinventing a wheel that's already been discarded multiple
times.  There are at least four separate problems:

1. The percentage of the table which is cached in shared_buffers at
plan time need not match the percentage that is cached at execution
time.  A delay of even a few seconds between planning and execution
could make the numbers totally different, and plans can be cached for
much longer than that.

2. Because shared_buffers can turn over quite quickly, planning the
statement multiple times in relatively quick succession could give
different results each time.  Previous discussions on this topic have
concluded that DBAs hate plan instability, and hate GEQO because it
causes plan instability, and this would inject plan instabiilty into
the main planner.

3. The percentage of the table which is cached in shared_buffers is
not necessarily representative of the percentage which is cached in
general.  On a large machine, shared_buffers may be less than 10% of
the total cache.  It would be unwise to make guesses about what is and
is not cached based on a small percentage of the cache.

4. Even if we could accurately estimate the percentage of the table
that is cached, what then?  For example, suppose that a user issues a
query which retrieves 1% of a table, and we know that 1% of that table
is cached.  How much of the data that the user asked for is cache?
Hard to say, right?  It could be none of it or all of it.  The second
scenario is easy to imagine - just suppose the query's been executed
twice.  The first scenario isn't hard to imagine either.

One idea Tom and I kicked around previously is to set an assumed
caching percentage for each table based on its size relative to
effective_cache_size - in other words, assume that the smaller a table
is, the more of it will be cached.  Consider a system with 8GB of RAM,
and a table which is 64kB.  It is probably unwise to make any plan
based on the assumption that that table is less than fully cached.  If
it isn't before the query executes, it soon will be.  Going to any
amount of work elsewhere in the plan to avoid the work of reading that
table in from disk is probably a dumb idea.  Of course, one downside
of this approach is that it doesn't know which tables are hot and
which tables are cold, but it would probably still be an improvement
over the status quo.

All that having been said, I think that while Josh is thinking fuzzily
about the mathematics of his proposal, the basic idea is pretty
sensible.  It is not easy - likely not possible - for the system to
have a good idea which things will be in some kind of cache at the
time the query is executed; it could even change mid-query.  The
execution of one part of the query could evict from the cache data
which some other part of the plan assumed would be cached.  But DBAs
frequently have a very good idea of which stuff is in cache - they can
make observations over a period of time and then adjust settings and
then observe some more and adjust some more.

PostgreSQL is extremely easy to administer compared with some of its
competitors, and it's frequently necessary to change very little.  But
there's a difference between what you absolutely have to change to
make it work and what you have the option to change when necessary.
We need to decrease the amount of stuff in the first category (as we
recently did with wal_buffers) and increase the amount of stuff in the
second category.  People coming from Oracle are not favorably
impressed either by the amount of monitoring data PostgreSQL can
gather or by the number of knobs that are available to fix problems
when they occur.  We don't need to have as many knobs as Oracle and we
probably don't want to, and for that matter we probably couldn't if we
did w

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Robert Haas

On Fri, Feb 25, 2011 at 3:14 AM, Daniel Farina  wrote:
> On Wed, Feb 23, 2011 at 11:49 AM, Greg Smith  wrote:
>> Robert Haas wrote:

 2. Synchronous replication.  Splitting up this patch has allowed some
>> On top of 4 listed reviewers I know Dan Farina is poking at the last update,
>> so we may see one more larger report on top of what's already shown up.  And
>> Jaime keeps kicking the tires too.  What Simon was hoping is that a week of
>> others looking at this would produce enough feedback that it might be
>> possible to sweep the remaining issues up soon after he's back.  It looks to
>> me like that's about when everything else that's still open will probably
>> settle too.
>
> Besides some of the fixable issues, I am going to have to echo
> Robert's sentiments about a few kinks that go beyond mechanism in the
> syncrep patch: in particular, it will *almost* solve the use case I
> was hoping to solve: a way to cleanly perform planned switchovers
> between machines with minimal downtime and no lost data. But there are
> a couple of holes I have thought of so far:

Well, just because the patch doesn't solve every use case isn't a
reason not to go forward with it - we can always add more options
later - but I have to admit that I'm kind of alarmed about the number
of bugs reported so far.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-25 Thread Daniel Farina

On Wed, Feb 23, 2011 at 11:49 AM, Greg Smith  wrote:
> Robert Haas wrote:
>>>
>>> 2. Synchronous replication.  Splitting up this patch has allowed some
> On top of 4 listed reviewers I know Dan Farina is poking at the last update,
> so we may see one more larger report on top of what's already shown up.  And
> Jaime keeps kicking the tires too.  What Simon was hoping is that a week of
> others looking at this would produce enough feedback that it might be
> possible to sweep the remaining issues up soon after he's back.  It looks to
> me like that's about when everything else that's still open will probably
> settle too.

Besides some of the fixable issues, I am going to have to echo
Robert's sentiments about a few kinks that go beyond mechanism in the
syncrep patch: in particular, it will *almost* solve the use case I
was hoping to solve: a way to cleanly perform planned switchovers
between machines with minimal downtime and no lost data. But there are
a couple of holes I have thought of so far:

1. The 2-safe methodology supported is not really compatible with
performing planned-HA-switchover of a cluster with its own syncrep
guarantees on top of that. For example:

Server A syncreps to Server B

Now I want to provision server A-prime, which will eventually take the
place of A.

Server A syncreps to Server B
Server A syncreps to Server A-prime

Right now, as it stands, the syncrep patch will be happy as soon as
the data has been fsynced to either B or A-prime; I don't think we can
guarantee at any point that A-prime can become the leader, and feed B.

2. The unprivileged user can disable syncrep, in any situation. This
flexibility is *great*, but you don't really want people to do it when
one is performing the switchover. Rather, in a magical world we'd hope
that disabling syncrep would just result in not having to
synchronously commit to B (but, in this case, still synchronously
commit to A-prime)

In other words, to my mind, you can use syncrep as-is to provide
2-safe durability xor a scheduled switchover: as soon as someone wants
both, I think they'll have some trouble. I do want both, though.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

66 matches

Mail list logo