Re: [PERFORM] Bulk persistence strategy

2017-05-22 Thread Simon Riggs
On 22 May 2017 at 09:06, Riaan Stander  wrote:

>> It sounds like you don't know about anonymous code blocks with DO
>> https://www.postgresql.org/docs/devel/static/sql-do.html
>>
>
> Yes I do know about that feature. My first implemented generated an
> anonymous code block, but to my utter dismay once I tried actually doing
> parameter binding from the application it did not work. This seems to be a
> Postgres limitation actually stated in the documentation. The anonymous code
> block is treated as a function body with no parameters.
>
> Thanks for the suggestion though.

Perhaps we should look into parameterisable DO statements.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bulk persistence strategy

2017-05-21 Thread Simon Riggs
On 22 May 2017 at 03:14, Riaan Stander  wrote:
>
>> Riaan Stander  writes:
>>>
>>> The intended use is use-once. The reason is that the statements might
>>> differ per call, especially when we start doing updates. The ideal would
>>> be to just issue the sql statements, but I was trying to cut down on
>>> network calls. To batch them together and get output from one query as
>>> input for the others (declare variables), I have to wrap them in a
>>> function in Postgres. Or am I missing something? In SQL Server TSQL I
>>> could declare variables in any statement as required.
>>
>> Hm, well, feeding data forward to the next query without a network
>> round trip is a valid concern.
>>
>> How stylized are these commands?  Have you considered pushing the
>> generation logic into the function, so that you just have one (or
>> a few) persistent functions, and the variability slack is taken
>> up through EXECUTE'd strings?  That'd likely be significantly
>> more efficient than one-use functions.  Even disregarding the
>> pg_proc update traffic, plpgsql isn't going to shine in that usage
>> because it's optimized for repeated execution of functions.
>>
>> regards, tom lane
>
> The commands are generated from a complex object/type in the application.
> Some of them can be quite large. With modifications they do state tracking
> too, so that we only update fields that actually changed and can do
> optimistic concurrency checking.
>
> It'll probably make more sense to try create a function per type of object
> that deals with the query generation. That way I can create a Postgres type
> that maps from the application object.
>
> Thanks for the advice. I'll give that a shot.

It sounds like you don't know about anonymous code blocks with DO
https://www.postgresql.org/docs/devel/static/sql-do.html

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Backup taking long time !!!

2017-01-23 Thread Simon Riggs
On 23 January 2017 at 17:12, Jeff Janes  wrote:

>> Just to make sure anyone reading the mailing list archives isn't
>> confused, running pg_start_backup does *not* make PG stop writing to
>> BASEDIR (or DATADIR, or anything, really).  PG *will* continue to write
>> data into BASEDIR after pg_start_backup has been called.
>
>
>
> Correct.  Unfortunately it is a very common myth that it does cause
> PostgreSQL to stop writing to the base dir.

Never heard that one before. Wow. Who's been saying that?

It's taken me years to hunt down all invalid backup memes and terminate them.

Never fails to surprise me how many people don't read the docs.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Understanding BRIN index performance

2016-10-03 Thread Simon Riggs
On 3 October 2016 at 10:58, Ivan Voras  wrote:

> I get that, my question was more about why the index scan returned 25 mil
> rows, when the pages are sequentially filled by timestamps? In my
> understading of BRIN, it should have returned a small number of pages which
> would have been filtered (and sorted) for the exact data, right?

That could be most simply explained if the distribution of your data
is not what you think it is.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Understanding BRIN index performance

2016-10-03 Thread Simon Riggs
On 3 October 2016 at 10:00, Ivan Voras  wrote:
> Hi,
>
> I have a table of around 20 G, more than 220 million records, and I'm
> running this query on it:
>
> explain analyze SELECT MAX(id) - (SELECT id FROM expl_transactions WHERE
> dateAdded < (now() - INTERVAL '10 MINUTES') ORDER BY dateAdded DESC LIMIT 1)
> FROM expl_transactions;
>
> "id" is SERIAL, "dateAdded" is timestamp without timezone
>
> The "dateAdded" field also has a "default now()" applied to it some time
> after its creation, and a fair amount of null values in the records (which I
> don't think matters for this query, but maybe I'm wrong).
>
> My first idea is to create a default BRIN index on dateAdded since the above
> query is not run frequently. To my surprise, the planner refused to use the
> index and used sequential scan instead. When I forced sequential scanning
> off, I got this:
>
> https://explain.depesz.com/s/W8oo
>
> The query was executing for 40+ seconds. It seems like the "index scan" on
> it returns nearly 9% of the table, 25 mil rows. Since the data in dateAdded
> actually is sequential and fairly selective (having now() as the default
> over a long period of time), this surprises me.
>
> With a normal btree index, of course, it runs fine:
>
> https://explain.depesz.com/s/TB5

Btree retains ordering, BRIN does not.

We've discussed optimizing the sort based upon BRIN metadata, but
that's not implemented yet.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Millions of tables

2016-09-29 Thread Simon Riggs
On 26 September 2016 at 05:19, Greg Spiegelberg  wrote:
> I did look at PostgresXL and CitusDB.  Both are admirable however neither
> could support the need to read a random record consistently under 30ms.
> It's a similar problem Cassandra and others have: network latency.  At this
> scale, to provide the ability to access any given record amongst trillions
> it is imperative to know precisely where it is stored (system & database)
> and read a relatively small index.  I have other requirements that prohibit
> use of any technology that is eventually consistent.

Then XL is exactly what you need, since it does allow you to calculate
exactly where the record is via hash and then access it, which makes
the request just a single datanode task.

XL is not the same as CitusDB.

> I liken the problem to fishing.  To find a particular fish of length, size,
> color &c in a data lake you must accept the possibility of scanning the
> entire lake.  However, if all fish were in barrels where each barrel had a
> particular kind of fish of specific length, size, color &c then the problem
> is far simpler.

The task of putting the fish in the appropriate barrel is quite hard.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-16 Thread Simon Riggs
On 12 December 2014 at 03:31, Simon Riggs  wrote:

> Also attached is a new parameter called enable_sortedpath which can be
> used to turn on/off the sorted path generated by the planner.

Now with attachment. (Thanks Jeff!)

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


enable_sorted_path.v1a.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-11 Thread Simon Riggs
On 12 December 2014 at 03:22, Simon Riggs  wrote:

> It's a simple patch, but it solves the test cases I know about and
> does almost nothing to planning time.

Test cases attached. The files marked "pettus_*" are written up from
Christophe Pettus' blog.
The other test case is one of my own devising, based upon recent
customer problems.

The "10 most recent calls" is a restatement of actual problems seen in the past.


Also attached is a new parameter called enable_sortedpath which can be
used to turn on/off the sorted path generated by the planner.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


sortedpath.sql
Description: Binary data


pettus_limit.sql
Description: Binary data


pettus_sel.sql
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-11 Thread Simon Riggs
On 30 September 2014 at 10:25, Simon Riggs  wrote:
> On 30 September 2014 00:00, Tom Lane  wrote:

>> The existing cost estimation
>> code effectively assumes that they're perfectly uniformly distributed;
>> which is a good average-case assumption but can be horribly wrong in
>> the worst case.
>
> Agreed. This is the main observation from which we can work.
>
>> If we could settle on some other model for the probable distribution
>> of the matching tuples, we could adjust the cost estimates for LIMIT
>> accordingly.  I have not enough statistics background to know what a
>> realistic alternative would be.
>
> I'm not sure that the correlation alone is sufficient to be able to do
> that. We'd need to estimate where the values looked for are likely to
> be wrt other values, then increase estimate accordingly. That sounds
> like a lot of pushups grovelling through quals and comparing against
> stats. So my thinking is actually to rule that out, unless you've some
> ideas for how to do that?
>
>> Another possibility is to still assume a uniform distribution but estimate
>> for, say, a 90% probability instead of 50% probability that we'll find
>> enough tuples after scanning X amount of the table.  Again, I'm not too
>> sure what that translates to in terms of the actual math, but it sounds
>> like something a statistics person could do in their sleep.


The problem is one of risk. Whatever distribution we use, it will be
wrong in some cases and good in others.

For example, if we look at "10 Most Recent Calls" for a user, then
frequent users would have one distribution, infrequent users another.
So we have multiple distributions in the same data. We just can't hold
enough information to make sense of this.

Think about how much data needs to be scanned if the user has only done 9 calls.

What I've done in the past is to rewrite the query in different ways
to force different plans, then call each plan depending upon the user
characteristics. This is can also be done with hints, in a more
ignorant way.


>> I do not think we should estimate for the worst case though.  If we do,
>> we'll hear cries of anguish from a lot of people, including many of the
>> same ones complaining now, because the planner stopped picking fast-start
>> plans even for cases where they are orders of magnitude faster than the
>> alternatives.
>
> Fast start plans still make sense when performing an IndexScan with no
> filter conditions. Those types of plan should not be changed from
> current costing - they are accurate, good and very important because
> of their frequency in real workloads.
>
> What I think we are seeing is Ordered plans being selected too often
> in preference to Sorted plans when we make selectivity or stats
> errors. As well as data distributions that aren't correctly described
> by the statistics causing much longer execution times.
>
> Here are some plan selection strategies
>
> * Cost based - attempt to exactly calculate the cost based upon
> existing stats - increase the complexity of cost calc to cover other
> aspects. Even if we do that, these may not be that helpful in covering
> the cases where the stats turn out to be wrong.
>
> * Risk based - A risk adjusted viewpoint would be that we should treat
> the cost as mid-way between the best and the worst. The worst is
> clearly scanning (100% - N) of the tuples, the best is just N tuples.
> So we should be costing scans with excess filter conditions as a (100%
> Scan)/2, no matter the conditions, based purely upon risk.
>
> * Simplified heuristic - deselect ordered plans when they are driven
> from scans without quals or indexscans with filters, since the risk
> adjusted cost is likely to be higher than the sorted cost. Inspecting
> the plan tree for this could be quite costly, so would only be done
> when the total cost is $high, prior to it being adjusted by LIMIT.
>
>
> In terms of practical steps... I suggest the following:
>
> * Implement enable_orderedscan = on (default) | off. A switch to allow
> plans to de-select ordered plans, so we can more easily see the
> effects of such plans in the wild.
>
> * Code heuristic approach - I can see where to add my heuristic in the
> grouping planner. So we just need to do a left? deep search of the
> plan tree looking for scans of the appropriate type and bail out if we
> find one.

After looking at this for some time I now have a patch that solves this.

It relies on the observation that index scans with no bounded quals
don't play nicely with LIMIT. The solution relies upon the point that
LIMIT does not reduce the startup cost of plans, only the total cost.
So we can solve the problem 

Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-09 Thread Simon Riggs
On 10 December 2014 at 10:46, Josh Berkus  wrote:
> On 12/05/2014 08:04 AM, Simon Riggs wrote:
>> On 6 December 2014 at 00:45, Merlin Moncure  wrote:
>>
>>> Neat -- got any test cases (would this have prevented OP's problem)?
>>
>> No test case was posted, so I am unable to confirm.
>>
>> A test case I produced that appears to be the same issue is fixed.
>>
>> I await confirmation from the OP.
>>
>
> So that's proprietary/confidential data.  However, the company involved
> has a large testbed and I could test their data using a patched version
> of Postgres.   In 3 months their data distribution has drifted, so I'll
> need to do some work to recreate the original bad plan circumstances.
> I'll keep you posted on how the patch works for that setup.
>
> It would be great to come up with a generic/public test for a bad
> abort-early situation.  Ideas?

If you could contribute that, it would be welcome.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-05 Thread Simon Riggs
On 6 December 2014 at 00:45, Merlin Moncure  wrote:

> Neat -- got any test cases (would this have prevented OP's problem)?

No test case was posted, so I am unable to confirm.

A test case I produced that appears to be the same issue is fixed.

I await confirmation from the OP.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-04 Thread Simon Riggs
On 30 September 2014 at 05:53, Simon Riggs  wrote:
> On 29 September 2014 16:00, Merlin Moncure  wrote:
>> On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs  wrote:
>>> The problem, as I see it, is different. We assume that if there are
>>> 100 distinct values and you use LIMIT 1 that you would only need to
>>> scan 1% of rows. We assume that the data is arranged in the table in a
>>> very homogenous layout. When data is not, and it seldom is, we get
>>> problems.
>>
>> Hm, good point -- 'data proximity'.  At least in theory, can't this be
>> measured and quantified?  For example, given a number of distinct
>> values, you could estimate the % of pages read (or maybe non
>> sequential seeks relative to the number of pages) you'd need to read
>> all instances of a particular value in the average (or perhaps the
>> worst) case.   One way of trying to calculate that would be to look at
>> proximity of values in sampled pages (and maybe a penalty assigned for
>> high update activity relative to table size).  Data proximity would
>> then become a cost coefficient to the benefits of LIMIT.
>
> The necessary first step to this is to realise that we can't simply
> apply the LIMIT as a reduction in query cost, in all cases.
>
> The way I'm seeing it, you can't assume the LIMIT will apply to any
> IndexScan that doesn't have an index condition. If it has just a
> filter, or nothing at all, just an ordering then it could easily scan
> the whole index if the stats are wrong.
>
> So plans like this could be wrong, by assuming the scan will end
> earlier because of the LIMIT than it actually will.
>
> Limit
>   IndexScan (no index cond)
>
> Limit
>   NestJoin
> IndexScan (no index cond)
> SomeScan
>
> Limit
>   NestJoin
> NestJoin
>   IndexScan (no index cond)
>   SomeScan
>SomeScan
>
> and deeper...
>
> I'm looking for a way to identify and exclude such plans, assuming
> that this captures at least some of the problem plans.

After looking at this for some time I now have a patch that solves this.

It relies on the observation that index scans with no bounded quals
don't play nicely with LIMIT. The solution relies upon the point that
LIMIT does not reduce the startup cost of plans, only the total cost.
So we can solve the problem by keeping the total cost estimate, just
move some of that into startup cost so LIMIT does not reduce costs as
much as before.

It's a simple patch, but it solves the test cases I know about and
does almost nothing to planning time.

I tried much less subtle approaches involving direct prevention of
LIMIT pushdown but the code was much too complex for my liking.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


avoid_limit_pushdown.v3.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unnecessary sort in the execution plan when doing group by

2014-11-04 Thread Simon Riggs
On 28 October 2014 06:26, Huang, Suya  wrote:

>Memory wanted: 3565580K bytes

This means "increase work_mem to this value".

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-10-02 Thread Simon Riggs
On 1 October 2014 19:56, Josh Berkus  wrote:
> On 09/30/2014 04:01 PM, Simon Riggs wrote:
>> On 30 September 2014 18:28, Jeff Janes  wrote:
>>
>>>> Anyway, in the particular case I posted fixing n_distinct to realistic
>>>> numbers (%) fixed the query plan.
>>>
>>>
>>> But wouldn't fixing the absolute number also have fixed the plan?
>>
>> There are two causes of this issue.
>>
>> 1. Poor estimates of n_distinct. Fixable by user.
>>
>> 2. Poor assumption of homogeneous distribution. No way for user to
>> fix. Insufficient stats detail to be able to solve in current planner.
>>
>> I see (2) as the main source of issues, since as we observe, (1) is fixable.
>
> I disagree that (1) is not worth fixing just because we've provided
> users with an API to override the stats.  It would unquestionably be
> better for us to have a better n_distinct estimate in the first place.
> Further, this is an easier problem to solve, and fixing n_distinct
> estimates would fix a large minority of currently pathological queries.
>  It's like saying "hey, we don't need to fix the leak in your radiator,
> we've given you a funnel in the dashboard you can pour water into."

Having read papers on it, I believe the problem is intractable. Coding
is not the issue. To anyone: please prove me wrong, in detail, with
references so it can be coded.

> I do agree that (2) is worth fixing *as well*.  In a first
> approximation, one possibility (as Tom suggests) would be to come up
> with a mathematical model for a selectivity estimate which was somewhere
> *between* homogenous distribution and the worst case.  While that
> wouldn't solve a lot of cases, it would be a start towards having a
> better model.

This may have a reasonable solution, but I don't know it. A more
accurate mathematical model will still avoid the main problem: it is a
guess, not certain knowledge and the risk will still remain.

>>> I don't think correlation is up to the task as a complete solution, although
>>> it might help a little.  There is no way a simple correlation can encode
>>> that John retired 15 years ago and hasn't logged on since, while Johannes
>>> was hired yesterday and never logged on before then.
>>
>> Ah, OK, essentially the same example.
>>
>> Which is why I ruled out correlation stats based approaches and
>> suggested a risk-weighted cost approach.
>
> By "risk-weighted" you mean just adjusting cost estimates based on what
> the worst case cost looks like, correct?  That seemed to be your
> proposal from an earlier post.  If so, we're in violent agreement here.

I proposed a clear path for this earlier in the thread and received no
comments as yet. Please look at that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-09-30 Thread Simon Riggs
On 30 September 2014 18:28, Jeff Janes  wrote:

>> Anyway, in the particular case I posted fixing n_distinct to realistic
>> numbers (%) fixed the query plan.
>
>
> But wouldn't fixing the absolute number also have fixed the plan?

There are two causes of this issue.

1. Poor estimates of n_distinct. Fixable by user.

2. Poor assumption of homogeneous distribution. No way for user to
fix. Insufficient stats detail to be able to solve in current planner.

I see (2) as the main source of issues, since as we observe, (1) is fixable.

An example is a social media application where the business query is
"Display the last 10 posts". If the user is a frequent, recent user
then the query could come back very quickly, so a reverse scan on
post_id would work great. If the user hasn't logged on for ages, then
that plan needs to scan lots and lots of data to get to find 10 posts.
That gives the problem that only certain users experience poor
performance - even the data isn't consistent in its distribution, so
stats wouldn't help much, even if we could capture the profile of the
"typical user".

>> > The problem, as I see it, is different. We assume that if there are
>> > 100 distinct values and you use LIMIT 1 that you would only need to
>> > scan 1% of rows. We assume that the data is arranged in the table in a
>> > very homogenous layout. When data is not, and it seldom is, we get
>> > problems.
>> >
>> > Simply put, assuming that LIMIT will reduce the size of all scans is
>> > just way wrong. I've seen many plans where increasing the LIMIT
>> > dramatically improves the plan.
>> >
>> > If we can at least agree it is a problem, we can try to move forwards.
>
>
> I don't think anyone doubts there is a problem (many more than one of them),
> there is just disagreement about the priority and what can be done about it.


>> That is certainly another problem.  Does correlation stat figure in the
>> LIMIT calculation at all, currently?  That's what correlation stat is
>> for, no?
>
>
> I don't think correlation is up to the task as a complete solution, although
> it might help a little.  There is no way a simple correlation can encode
> that John retired 15 years ago and hasn't logged on since, while Johannes
> was hired yesterday and never logged on before then.

Ah, OK, essentially the same example.

Which is why I ruled out correlation stats based approaches and
suggested a risk-weighted cost approach.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-09-30 Thread Simon Riggs
On 30 September 2014 00:00, Tom Lane  wrote:
> Simon Riggs  writes:
>> The way I'm seeing it, you can't assume the LIMIT will apply to any
>> IndexScan that doesn't have an index condition. If it has just a
>> filter, or nothing at all, just an ordering then it could easily scan
>> the whole index if the stats are wrong.
>
> That statement applies with equal force to *any* plan with a LIMIT;
> it's not just index scans.

Agreed

> The real question is to what extent are the tuples satisfying the extra
> filter condition randomly distributed with respect to the index order
> (or physical order, if it's a seqscan).

Agreed

> The existing cost estimation
> code effectively assumes that they're perfectly uniformly distributed;
> which is a good average-case assumption but can be horribly wrong in
> the worst case.

Agreed. This is the main observation from which we can work.

> If we could settle on some other model for the probable distribution
> of the matching tuples, we could adjust the cost estimates for LIMIT
> accordingly.  I have not enough statistics background to know what a
> realistic alternative would be.

I'm not sure that the correlation alone is sufficient to be able to do
that. We'd need to estimate where the values looked for are likely to
be wrt other values, then increase estimate accordingly. That sounds
like a lot of pushups grovelling through quals and comparing against
stats. So my thinking is actually to rule that out, unless you've some
ideas for how to do that?

> Another possibility is to still assume a uniform distribution but estimate
> for, say, a 90% probability instead of 50% probability that we'll find
> enough tuples after scanning X amount of the table.  Again, I'm not too
> sure what that translates to in terms of the actual math, but it sounds
> like something a statistics person could do in their sleep.
>
> I do not think we should estimate for the worst case though.  If we do,
> we'll hear cries of anguish from a lot of people, including many of the
> same ones complaining now, because the planner stopped picking fast-start
> plans even for cases where they are orders of magnitude faster than the
> alternatives.

Fast start plans still make sense when performing an IndexScan with no
filter conditions. Those types of plan should not be changed from
current costing - they are accurate, good and very important because
of their frequency in real workloads.

What I think we are seeing is Ordered plans being selected too often
in preference to Sorted plans when we make selectivity or stats
errors. As well as data distributions that aren't correctly described
by the statistics causing much longer execution times.

Here are some plan selection strategies

* Cost based - attempt to exactly calculate the cost based upon
existing stats - increase the complexity of cost calc to cover other
aspects. Even if we do that, these may not be that helpful in covering
the cases where the stats turn out to be wrong.

* Risk based - A risk adjusted viewpoint would be that we should treat
the cost as mid-way between the best and the worst. The worst is
clearly scanning (100% - N) of the tuples, the best is just N tuples.
So we should be costing scans with excess filter conditions as a (100%
Scan)/2, no matter the conditions, based purely upon risk.

* Simplified heuristic - deselect ordered plans when they are driven
from scans without quals or indexscans with filters, since the risk
adjusted cost is likely to be higher than the sorted cost. Inspecting
the plan tree for this could be quite costly, so would only be done
when the total cost is $high, prior to it being adjusted by LIMIT.


In terms of practical steps... I suggest the following:

* Implement enable_orderedscan = on (default) | off. A switch to allow
plans to de-select ordered plans, so we can more easily see the
effects of such plans in the wild.

* Code heuristic approach - I can see where to add my heuristic in the
grouping planner. So we just need to do a left? deep search of the
plan tree looking for scans of the appropriate type and bail out if we
find one.

Thoughts?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-09-30 Thread Simon Riggs
On 29 September 2014 22:54, Josh Berkus  wrote:
> On 09/26/2014 01:06 AM, Simon Riggs wrote:
>> On 23 September 2014 00:56, Josh Berkus  wrote:
>>
>>> We've hashed that out a bit, but frankly I think it's much more
>>> profitable to pursue fixing the actual problem than providing a
>>> workaround like "risk", such as:
>>>
>>> a) fixing n_distinct estimation
>>> b) estimating stacked quals using better math (i.e. not assuming total
>>> randomness)
>>> c) developing some kind of correlation stats
>>>
>>> Otherwise we would be just providing users with another knob there's no
>>> rational way to set.
>>
>> I believe this is a serious issue for PostgreSQL users and one that
>> needs to be addressed.
>>
>> n_distinct can be fixed manually, so that is less of an issue.
>
> It's an issue for the 99.8% of our users who don't know what n_distinct
> is, let alone how to calculate it.  Also, changing it requires an
> exclusive lock on the table. Of course, you and I have been over this
> issue before.

In 9.4 you'll be able to set n_distinct using only a Share Update
Exclusive lock.

So that's no longer a problem.

The quality of the n_distinct itself is an issue, but with no current
solution, but then that is why you can set it manually,

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-09-29 Thread Simon Riggs
On 29 September 2014 16:00, Merlin Moncure  wrote:
> On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs  wrote:
>> The problem, as I see it, is different. We assume that if there are
>> 100 distinct values and you use LIMIT 1 that you would only need to
>> scan 1% of rows. We assume that the data is arranged in the table in a
>> very homogenous layout. When data is not, and it seldom is, we get
>> problems.
>
> Hm, good point -- 'data proximity'.  At least in theory, can't this be
> measured and quantified?  For example, given a number of distinct
> values, you could estimate the % of pages read (or maybe non
> sequential seeks relative to the number of pages) you'd need to read
> all instances of a particular value in the average (or perhaps the
> worst) case.   One way of trying to calculate that would be to look at
> proximity of values in sampled pages (and maybe a penalty assigned for
> high update activity relative to table size).  Data proximity would
> then become a cost coefficient to the benefits of LIMIT.

The necessary first step to this is to realise that we can't simply
apply the LIMIT as a reduction in query cost, in all cases.

The way I'm seeing it, you can't assume the LIMIT will apply to any
IndexScan that doesn't have an index condition. If it has just a
filter, or nothing at all, just an ordering then it could easily scan
the whole index if the stats are wrong.

So plans like this could be wrong, by assuming the scan will end
earlier because of the LIMIT than it actually will.

Limit
  IndexScan (no index cond)

Limit
  NestJoin
IndexScan (no index cond)
SomeScan

Limit
  NestJoin
NestJoin
  IndexScan (no index cond)
  SomeScan
   SomeScan

and deeper...

I'm looking for a way to identify and exclude such plans, assuming
that this captures at least some of the problem plans.

Comments? Test Cases?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-09-26 Thread Simon Riggs
On 23 September 2014 00:56, Josh Berkus  wrote:

> We've hashed that out a bit, but frankly I think it's much more
> profitable to pursue fixing the actual problem than providing a
> workaround like "risk", such as:
>
> a) fixing n_distinct estimation
> b) estimating stacked quals using better math (i.e. not assuming total
> randomness)
> c) developing some kind of correlation stats
>
> Otherwise we would be just providing users with another knob there's no
> rational way to set.

I believe this is a serious issue for PostgreSQL users and one that
needs to be addressed.

n_distinct can be fixed manually, so that is less of an issue.

The problem, as I see it, is different. We assume that if there are
100 distinct values and you use LIMIT 1 that you would only need to
scan 1% of rows. We assume that the data is arranged in the table in a
very homogenous layout. When data is not, and it seldom is, we get
problems.

Simply put, assuming that LIMIT will reduce the size of all scans is
just way wrong. I've seen many plans where increasing the LIMIT
dramatically improves the plan.

If we can at least agree it is a problem, we can try to move forwards.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-09-17 Thread Simon Riggs
On 1 July 2014 20:20, Tom Lane  wrote:
> Jeff Frost  writes:
>>> On Jun 30, 2014, at 4:04 PM, Tom Lane  wrote:
>>>> Did you check whether the locks were all on temp tables of the
>>>> ON COMMIT DROP persuasion?
>
>> And indeed it did catch up overnight and the lag increased shortly after a 
>> correlating spike in AccessExclusiveLocks that were generated by temp table 
>> creation with on commit drop.
>
> OK, so we have a pretty clear idea of where the problem is now.
>
> It seems like there are three, not mutually exclusive, ways we might
> address this:
>
> 1. Local revisions inside StandbyReleaseLocks to make it perform better in
> the presence of many locks.  This would only be likely to improve matters
> much if there's a fixable O(N^2) algorithmic issue; but there might well
> be one.
>
> 2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on
> the grounds that no standby should be touching them.  I'm not entirely
> sure that that argument is bulletproof though; in particular, even though
> a standby couldn't access the table's data, it's possible that it would be
> interested in seeing consistent catalog entries.
>
> 3. Avoid WAL-logging AccessExclusiveLocks associated with
> new-in-transaction tables, temp or not, on the grounds that no standby
> could even see such tables until they're committed.  We could go a bit
> further and not take out any locks on a new-in-transaction table in the
> first place, on the grounds that other transactions on the master can't
> see 'em either.
>
> It sounded like Andres had taken a preliminary look at #1 and found a
> possible avenue for improvement, which I'd encourage him to pursue.
>
> For both #2 and the conservative version of #3, the main implementation
> problem would be whether the lock WAL-logging code has cheap access to
> the necessary information.  I suspect it doesn't.
>
> The radical version of #3 might be pretty easy to do, at least to the
> extent of removing locks taken out during CREATE TABLE.  I suspect there
> are some assertions or other consistency checks that would get unhappy if
> we manipulate relations without locks, though, so those would have to be
> taught about the exception.  Also, we sometimes forget new-in-transaction
> status during relcache flush events; it's not clear if that would be a
> problem for this.
>
> I don't plan to work on this myself, but perhaps someone with more
> motivation will want to run with these ideas.

Patch implements option 2 in the above.

Skipping the locks entirely seems like it opens a can of worms.

Skipping the lock for temp tables is valid since locks don't need to
exist on the standby. Any catalog entries for them will exist, but the
rows will show them as temp and nobody would expect them to be valid
outside of the original session.

Patch implements a special case that takes the lock normally, but
skips WAL logging the lock info.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


temp_tables_skip_logging_locks.v1.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-03 Thread Simon Riggs
On 1 July 2014 20:20, Tom Lane  wrote:

> I don't plan to work on this myself, but perhaps someone with more
> motivation will want to run with these ideas.

I was planning to work on improving performance of replication apply
over the summer, mid July - Aug, so I'll add this to the list.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-06-16 Thread Simon Riggs
On 16 June 2013 16:23, Heikki Linnakangas  wrote:
> On 06.05.2013 04:51, Mark Kirkwood wrote:
>>
>> On 05/05/13 00:49, Simon Riggs wrote:
>>>
>>> On 3 May 2013 13:41, Simon Riggs  wrote:
>>>
>>>> (3) to make the check on TransactionIdIsInProgress() into a heuristic,
>>>> since we don't *need* to check that, so if we keep checking the same
>>>> xid repeatedly we can reduce the number of checks or avoid xids that
>>>> seem to be long running. That's slightly more coding than my quick
>>>> hack here but seems worth it.
>>>>
>>>> I think we need both (1) and (3) but the attached patch does just (1).
>>>>
>>>> This is a similar optimisation to the one I introduced for
>>>> TransactionIdIsKnownCompleted(), except this applies to repeated
>>>> checking of as yet-incomplete xids, and to bulk concurrent
>>>> transactions.
>>>
>>>
>>> ISTM we can improve performance of TransactionIdIsInProgress() by
>>> caching the procno of our last xid.
>>>
>>> Mark, could you retest with both these patches? Thanks.
>>>
>>
>> Thanks Simon, will do and report back.
>
>
> Did anyone ever try (3) ?

No, because my other patch meant I didn't need to. In other words, my
other patch speeded up repeated access enough I didn't care about (3)
anymore.


> I'm not sure if this the same idea as (3) above, but ISTM that
> HeapTupleSatisfiesMVCC doesn't actually need to call
> TransactionIdIsInProgress(), because it checks XidInMVCCSnapshot(). The
> comment at the top of tqual.c says:
>
>>  * NOTE: must check TransactionIdIsInProgress (which looks in PGXACT
>> array)
>>  * before TransactionIdDidCommit/TransactionIdDidAbort (which look in
>>  * pg_clog).  Otherwise we have a race condition: we might decide that a
>>  * just-committed transaction crashed, because none of the tests succeed.
>>  * xact.c is careful to record commit/abort in pg_clog before it unsets
>>  * MyPgXact->xid in PGXACT array.  That fixes that problem, but it also
>>  * means there is a window where TransactionIdIsInProgress and
>>  * TransactionIdDidCommit will both return true.  If we check only
>>  * TransactionIdDidCommit, we could consider a tuple committed when a
>>  * later GetSnapshotData call will still think the originating transaction
>>  * is in progress, which leads to application-level inconsistency.
>> The
>>  * upshot is that we gotta check TransactionIdIsInProgress first in all
>>  * code paths, except for a few cases where we are looking at
>>  * subtransactions of our own main transaction and so there can't be any
>>  * race condition.
>
>
> If TransactionIdIsInProgress() returns true for a given XID, then surely it
> was also running when the snapshot was taken (or had not even began yet). In
> which case the XidInMVCCSnapshot() call will also return true. Am I missing
> something?
>
> There's one little problem: we currently only set the hint bits when
> TransactionIdIsInProgress() returns false. If we do that earlier, then even
> though HeapTupleSatisfiesMVCC works correctly thanks to the
> XidInMVCCSnapshot call, other HeapTupleSatisfies* functions that don't call
> XIdInMVCCSnapshot might see the tuple as committed or aborted too early, if
> they see the hint bit as set while the transaction is still in-progress
> according to the proc array. Would have to check all the callers of those
> other HeapTupleSatisfies* functions to verify if that's OK.

Well, I looked at that and its too complex and fiddly to be worth it, IMHO.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-06-16 Thread Simon Riggs
On 16 June 2013 16:04, Heikki Linnakangas  wrote:
> On 03.05.2013 15:41, Simon Riggs wrote:
>>
>> What appears to be happening is we're spending a lot of time in
>> TransactionIdIsInProgress() so we can set hints and then when we find
>> it is still in progress we then spend more time in XidIsInSnapshot()
>> while we check that it is still invisible to us. Even if the
>> transaction we see repeatedly ends, we will still pay the cost in
>> XidIsInSnapshot repeatedly as we execute.
>>
>> Given that code path, I would expect it to suck worse on a live system
>> with many sessions, and even worse with many subtransactions.
>>
>> (1) A proposed fix is attached, but its only a partial one and barely
>> tested.
>>
>> Deeper fixes might be
>>
>> (2)  to sort the xid array if we call XidIsInSnapshot too many times
>> in a transaction. I don't think that is worth it, because a long
>> running snapshot may be examined many times, but is unlikely to see
>> multiple in-progress xids repeatedly. Whereas your case seems
>> reasonably common.
>
>
> Yeah, sorting would be a waste of time most of the time.
>
> Instead of adding a new cache field, how about just swapping the matched XID
> to the beginning of the array?

Do you think that is significantly different from what I've done?

> Did you have some simple performance test script for this?

Files attached to set up and tear down the test. Needs
max_prepared_transactions = 100

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


inprogr1.sql
Description: Binary data


inprogr2.sql
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-07 Thread Simon Riggs
On 7 May 2013 07:32, Mark Kirkwood  wrote:
> On 07/05/13 18:10, Simon Riggs wrote:
>>
>> On 7 May 2013 01:23,   wrote:
>>
>>> I'm thinking that a variant of (2) might be simpler to inplement:
>>>
>>> (I think Matt C essentially beat me to this suggestion - he originally
>>> discovered this issue). It is probably good enough for only *new* plans
>>> to
>>> react to the increased/increasing number of in progress rows. So this
>>> would require backends doing significant numbers of row changes to either
>>> directly update pg_statistic or report their in progress numbers to the
>>> stats collector. The key change here is the partial execution numbers
>>> would need to be sent. Clearly one would need to avoid doing this too
>>> often (!) - possibly only when number of changed rows >
>>> autovacuum_analyze_scale_factor proportion of the relation concerned or
>>> similar.
>>
>>
>> Are you loading using COPY? Why not break down the load into chunks?
>>
>
> INSERT - but we could maybe workaround by chunking the INSERT. However that
> *really* breaks the idea that in SQL you just say what you want, not how the
> database engine should do it! And more practically means that the most
> obvious and clear way to add your new data has nasty side effects, and you
> have to tip toe around muttering secret incantations to make things work
> well :-)

Yes, we'd need to break up SQL statements into pieces and use external
transaction snapshots to do that.

> I'm still thinking that making postgres smarter about having current stats
> for getting the actual optimal plan is the best solution.

I agree.

The challenge now is to come up with something that actually works;
most of the ideas have been very vague and ignore the many downsides.
The hard bit is the analysis and balanced thinking, not the
developing.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-06 Thread Simon Riggs
On 7 May 2013 01:23,   wrote:

> I'm thinking that a variant of (2) might be simpler to inplement:
>
> (I think Matt C essentially beat me to this suggestion - he originally
> discovered this issue). It is probably good enough for only *new* plans to
> react to the increased/increasing number of in progress rows. So this
> would require backends doing significant numbers of row changes to either
> directly update pg_statistic or report their in progress numbers to the
> stats collector. The key change here is the partial execution numbers
> would need to be sent. Clearly one would need to avoid doing this too
> often (!) - possibly only when number of changed rows >
> autovacuum_analyze_scale_factor proportion of the relation concerned or
> similar.

Are you loading using COPY? Why not break down the load into chunks?

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-06 Thread Simon Riggs
On 6 May 2013 02:51, Mark Kirkwood  wrote:
> On 05/05/13 00:49, Simon Riggs wrote:
>>
>> On 3 May 2013 13:41, Simon Riggs  wrote:
>>
>>> (3) to make the check on TransactionIdIsInProgress() into a heuristic,
>>> since we don't *need* to check that, so if we keep checking the same
>>> xid repeatedly we can reduce the number of checks or avoid xids that
>>> seem to be long running. That's slightly more coding than my quick
>>> hack here but seems worth it.
>>>
>>> I think we need both (1) and (3) but the attached patch does just (1).
>>>
>>> This is a similar optimisation to the one I introduced for
>>> TransactionIdIsKnownCompleted(), except this applies to repeated
>>> checking of as yet-incomplete xids, and to bulk concurrent
>>> transactions.
>>
>>
>> ISTM we can improve performance of TransactionIdIsInProgress() by
>> caching the procno of our last xid.
>>
>> Mark, could you retest with both these patches? Thanks.
>>
>
> Thanks Simon, will do and report back.

OK, here's a easily reproducible test...

Prep:
DROP TABLE IF EXISTS plan;
CREATE TABLE plan
(
  id INTEGER NOT NULL,
  typ INTEGER NOT NULL,
  dat TIMESTAMP,
  val TEXT NOT NULL
);
insert into plan select generate_series(1,10), 0,
current_timestamp, 'some texts';
CREATE UNIQUE INDEX plan_id ON plan(id);
CREATE INDEX plan_dat ON plan(dat);

testcase.pgb
select count(*) from plan where dat is null and typ = 3;

Session 1:
pgbench -n -f testcase.pgb -t 100

Session 2:
BEGIN; insert into plan select 100 + generate_series(1, 10),
3, NULL, 'b';

Transaction rate in Session 1: (in tps)
(a) before we run Session 2:
Current: 5600tps
Patched: 5600tps

(b) after Session 2 has run, yet before transaction end
Current: 56tps
Patched: 65tps

(c ) after Session 2 has aborted
Current/Patched: 836, 1028, 5400tps
VACUUM improves timing again

New version of patch attached which fixes a few bugs.

Patch works and improves things, but we're still swamped by the block
accesses via the index.

Which brings me back to Mark's original point, which is that we are
x100 times slower in this case and it *is* because the choice of
IndexScan is a bad one for this situation.

After some thought on this, I do think we need to do something about
it directly, rather than by tuning infrastructire (as I just
attempted). The root cause here is that IndexScan plans are sensitive
to mistakes in data distribution, much more so than other plan types.

The two options, broadly, are to either

1. avoid IndexScans in the planner unless they have a *significantly*
better cost. At the moment we use IndexScans if cost is lowest, even
if that is only by a whisker.

2. make IndexScans adaptive so that they switch to other plan types
mid-way through execution.

(2) seems fairly hard generically, since we'd have to keep track of
the tids returned from the IndexScan to allow us to switch to a
different plan and avoid re-issuing rows that we've already returned.
But maybe if we adapted the IndexScan plan type so that it adopted a
more page oriented approach internally, it could act like a
bitmapscan. Anyway, that would need some proof that it would work and
sounds like a fair task.

(1) sounds more easily possible and plausible. At the moment we have
enable_indexscan = off. If we had something like
plan_cost_weight_indexscan = N, we could selectively increase the cost
of index scans so that they would be less likely to be selected. i.e.
plan_cost_weight_indexscan = 2 would mean an indexscan would need to
be half the cost of any other plan before it was selected. (parameter
name selected so it could apply to all parameter types). The reason to
apply this weighting would be to calculate "risk adjusted cost" not
just estimated cost.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


cache_TransactionIdInProgress.v2.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-04 Thread Simon Riggs
On 3 May 2013 13:41, Simon Riggs  wrote:

> (3) to make the check on TransactionIdIsInProgress() into a heuristic,
> since we don't *need* to check that, so if we keep checking the same
> xid repeatedly we can reduce the number of checks or avoid xids that
> seem to be long running. That's slightly more coding than my quick
> hack here but seems worth it.
>
> I think we need both (1) and (3) but the attached patch does just (1).
>
> This is a similar optimisation to the one I introduced for
> TransactionIdIsKnownCompleted(), except this applies to repeated
> checking of as yet-incomplete xids, and to bulk concurrent
> transactions.

ISTM we can improve performance of TransactionIdIsInProgress() by
caching the procno of our last xid.

Mark, could you retest with both these patches? Thanks.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


cache_TransactionIdInProgress.v1.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-03 Thread Simon Riggs
On 2 May 2013 23:19,   wrote:
>> On 2 May 2013 01:49, Mark Kirkwood  wrote:
>>
>> I think we need a problem statement before we attempt a solution,
>> which is what Tom is alluding to.
>>
>
> Actually no - I think Tom (quite correctly) was saying that the patch was
> not a viable solution. With which I agree.
>
> I believe the title of this thread is the problem statement.
>
>> ISTM that you've got a case where the plan is very sensitive to a
>> table load. Which is a pretty common situation and one that can be
>> solved in various ways. I don't see much that Postgres can do because
>> it can't know ahead of time you're about to load rows. We could
>> imagine an optimizer that set thresholds on plans that caused the
>> whole plan to be recalculated half way thru a run, but that would be a
>> lot of work to design and implement and even harder to test. Having
>> static plans at least allows us to discuss what it does after the fact
>> with some ease.
>>
>> The plan is set using stats that are set when there are very few
>> non-NULL rows, and those increase massively on load. The way to cope
>> is to run the ANALYZE immediately after the load and then don't allow
>> auto-ANALYZE to reset them later.
>
> No. We do run analyze immediately after the load. The surprise was that
> this was not sufficient - the (small) amount of time where non optimal
> plans were being used due to the in progress row activity was enough to
> cripple the system - that is the problem. The analysis of why not led to
> the test case included in the original email. And sure it is deliberately
> crafted to display the issue, and is therefore open to criticism for being
> artificial. However it was purely meant to make it easy to see what I was
> talking about.

I had another look at this and see I that I read the second explain incorrectly.

The amount of data examined and returned is identical in both plans.
The only difference is the number of in-progress rows seen by the
second query. Looking at the numbers some more, it looks like 6000
in-progress rows are examined in addition to the data. It might be
worth an EXPLAIN patch to put instrumentation in to show that, but its
not that interesting.

It would be useful to force the indexscan into a bitmapscan to check
that the cost isn't attributable to the plan but to other overheads.

What appears to be happening is we're spending a lot of time in
TransactionIdIsInProgress() so we can set hints and then when we find
it is still in progress we then spend more time in XidIsInSnapshot()
while we check that it is still invisible to us. Even if the
transaction we see repeatedly ends, we will still pay the cost in
XidIsInSnapshot repeatedly as we execute.

Given that code path, I would expect it to suck worse on a live system
with many sessions, and even worse with many subtransactions.

(1) A proposed fix is attached, but its only a partial one and barely tested.

Deeper fixes might be

(2)  to sort the xid array if we call XidIsInSnapshot too many times
in a transaction. I don't think that is worth it, because a long
running snapshot may be examined many times, but is unlikely to see
multiple in-progress xids repeatedly. Whereas your case seems
reasonably common.

(3) to make the check on TransactionIdIsInProgress() into a heuristic,
since we don't *need* to check that, so if we keep checking the same
xid repeatedly we can reduce the number of checks or avoid xids that
seem to be long running. That's slightly more coding than my quick
hack here but seems worth it.

I think we need both (1) and (3) but the attached patch does just (1).

This is a similar optimisation to the one I introduced for
TransactionIdIsKnownCompleted(), except this applies to repeated
checking of as yet-incomplete xids, and to bulk concurrent
transactions.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


xid_in_snapshot_cache.v1.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] "WHERE 1 = 2 OR ..." makes planner choose a very inefficient plan

2013-05-02 Thread Simon Riggs
On 18 April 2013 15:46, Tom Lane  wrote:
> dmitry potapov  writes:
>> I recently stumbled upon on what could be a planner bug or a corner case.
>> If " OR ..." is added to WHERE clause of SELECT query,
>> then the planner chooses a very inefficient plan. Consider a query:
>
>> SELECT count(k0.id)
>> FROM k0
>> WHERE 1 = 2
>> OR k0.id IN (
>> SELECT k1.k0_id
>> FROM k1
>> WHERE k1.k1k2_id IN (
>> SELECT k2.k1k2_id
>> FROM k2
>> WHERE k2.t = 2
>> AND (coalesce(k2.z, '')) LIKE '%12%'
>> )
>> );
>
> Perhaps you should fix your application to not generate such incredibly
> silly SQL.  Figuring out that 1=2 is constant false and throwing it away
> costs the server easily a thousand times as many instructions as it
> would take for the client to not emit that in the first place.
>
> The reason you don't get a nice semijoin plan when you do that is that
> conversion of IN clauses to semijoins happens before
> constant-subexpression simplification.  So the planner hasn't yet
> figured out that the OR is useless when it would need to know that to
> produce a good plan.  (And no, we can't just flip the order of those two
> steps.  Doing two rounds of const-simplification wouldn't be a good
> answer either, because it would penalize well-written queries to benefit
> badly-written ones.)

The situation shown could be the result of SQL injection attack.

It would be nice to have a switch to do additional checks on SQL
queries to ensure such injections don't cause long runtimes to return
useless answers.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query planner ignoring constraints on partitioned tables when joining

2013-05-02 Thread Simon Riggs
On 18 April 2013 22:42, Tom Lane  wrote:

> One could imagine adding planner logic that would make inferences of a
> similar sort for equalities combined with inequalities, but it would be
> vastly more complicated, and would provide useful results in vastly
> fewer queries, than the equality-propagation logic.  So don't hold your
> breath waiting for something like that to happen.

I'll take note that we need to make partitioning work for merge joins also.

On a more general note, it would be good to be able to look at the
starting value from the driving table of the join and use that as a
constraint in the scan on the second table. We rely on that mechanism
for nested loop joins, so we could do with that here also.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-02 Thread Simon Riggs
On 2 May 2013 01:49, Mark Kirkwood  wrote:
> On 02/05/13 02:06, Tom Lane wrote:
>>
>> Mark Kirkwood  writes:
>>>
>>> I am concerned that the deafening lack of any replies to my original
>>> message is a result of folk glancing at your original quick reply and
>>> thinking... incomplete problem spec...ignore... when that is not that
>>> case - yes I should have muttered "9.2" in the original email, but we
>>> have covered that now.
>>
>> No, I think it's more that we're trying to get to beta, and so anything
>> that looks like new development is getting shuffled to folks' "to
>> look at later" queues.  The proposed patch is IMO a complete nonstarter
>> anyway; but I'm not sure what a less bogus solution would look like.
>>
>
> Yeah, I did think that beta might be consuming everyone's attention (of
> course immediately *after* sending the email)!
>
> And yes, the patch was merely to illustrate the problem rather than any
> serious attempt at a solution.

I think we need a problem statement before we attempt a solution,
which is what Tom is alluding to.

ISTM that you've got a case where the plan is very sensitive to a
table load. Which is a pretty common situation and one that can be
solved in various ways. I don't see much that Postgres can do because
it can't know ahead of time you're about to load rows. We could
imagine an optimizer that set thresholds on plans that caused the
whole plan to be recalculated half way thru a run, but that would be a
lot of work to design and implement and even harder to test. Having
static plans at least allows us to discuss what it does after the fact
with some ease.

The plan is set using stats that are set when there are very few
non-NULL rows, and those increase massively on load. The way to cope
is to run the ANALYZE immediately after the load and then don't allow
auto-ANALYZE to reset them later.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance die when COPYing to table with bigint PK

2011-08-01 Thread Simon Riggs
On Sun, Jul 31, 2011 at 2:51 PM, Robert Ayrapetyan
 wrote:

> I've found strange behavior of my pg installation (tested both 8.4 and
> 9.0 - they behave same) on FreeBSD platform.
> In short - when some table have PK on bigint field - COPY to that
> table from file becomes slower and slower as table grows. When table
> reaches ~5GB - COPY of 100k records may take up to 20 mins. I've
> experimented with all params in configs, moved indexes to separate hdd
> etc - nothing made any improvement. However, once I'm dropping 64 bit
> PK - COPY of 100k records passes in seconds. Interesting thing - same
> table has other indexes, including composite ones, but none of them
> include bigint fields, that's why I reached decision that bug
> connected with indexes on bigint fields only.
>
> In terms of IO picture is following: after copy started gstat shows
> 100% load on index partition (as I mentioned above - I've tried
> separate hdd to keep index tablespace), large queue (over 2k
> elements), and constant slow write on speed of ~2MB\s. Hdd becomes
> completely unresponsive, even ls on empty folder hangs for minute or
> so.
>
> To avoid thoughts like "your hdd is slow, you haven't tuned
> postgresql.conf etc" - all slowness dissapears with drop of bigint PK,
> same time other indexes on same table remain alive. And yes - I've
> tried drop PK \ recreate PK, vacuum full analyze and all other things
> - nothing helped, only drop helps.
>
> Is this known and expected behavior?

This is a duplicate post with one on BUGS, being discussed there.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0.4 + Hot Standby + FusionIO Drive + Performance => Query failed ERROR: catalog is missing 1 attribute(s) for relid 172226

2011-05-08 Thread Simon Riggs
On Tue, May 3, 2011 at 10:02 AM, Sethu Prasad  wrote:

> I tried with the PostgreSQL 9.0.4 + Hot Standby and running the database
> from Fusion IO Drive to understand the PG Performance.
>
> While doing so I got the "Query failed ERROR: catalog is missing 1
> attribute(s) for relid 172226". Any idea on this error? Is that combination
> PG + HotSB + Fusion IO Drive is not advisable?!

Why I wonder do you think this might have anything to do with Hot
Standby and/or FusionIO drives?

This indicates either catalog or catalog index corruption of some kind.

Did you only get this error once?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Group commit and commit delay/siblings

2010-12-08 Thread Simon Riggs
On Mon, 2010-12-06 at 23:52 -0500, Greg Smith wrote:
> Jignesh Shah wrote:
> > On Tue, Dec 7, 2010 at 1:55 AM, Tom Lane  wrote:
> >   
> >> I could have sworn we'd refactored that to something like
> >>bool ThereAreAtLeastNActiveBackends(int n)
> >> which could drop out of the loop as soon as it'd established what we
> >> really need to know...I'd suggest that we just improve the
> >> coding so that we don't scan ProcArray at all when commit_siblings is 0.
> >>
> >> (I do agree with improving the docs to warn people away from assuming
> >> this is a knob to frob mindlessly.)
> >> 
> > In that case I propose that we support commit_siblings=0 which is not
> > currently supported. Minimal value for commit_siblings  is currently
> > 1. If we support commit_siblings=0 then it should short-circuit that
> > function call which is often what I do in my tests with commit_delay.
> >   
> 
> Everybody should be happy now:  attached patch refactors the code to 
> exit as soon as the siblings count is exceeded, short-circuits with no 
> scanning of ProcArray if the minimum is 0, and allows setting the 
> siblings to 0 to enable that shortcut:

Minor patch, no downsides. Docs checked. Committed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partition pruning

2010-03-08 Thread Simon Riggs
On Thu, 2010-03-04 at 17:40 -0500, Robert Haas wrote:
> On Mon, Mar 1, 2010 at 2:29 PM, Anj Adu  wrote:
> > When I use intervals in my query e.g  col1 between current_timestamp -
> > interval '10 days' and current_timestamp...the optimizer checks ALL
> > partitions  whereas if I use   col1 between 2 hardcoded dates..only
> > the applicable partitions are scanned.
> 
> Yep.  This is one example of a more general principle:
> constant-folding happens before planning, but anything more complex
> has to wait until execution time.  So the plan can't take into account
> the value of current_timestamp in forming the plan.

It could, but it doesn't yet. Partition removal can take place in the
executor and this is currently targeted for 9.1.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Air-traffic benchmark

2010-02-04 Thread Simon Riggs
On Thu, 2010-01-07 at 13:38 +0100, Lefteris wrote:
> Reported query times are (in sec):
> MonetDB 7.9s
> InfoBright 12.13s
> LucidDB 54.8s

It needs to be pointed out that those databases are specifically
optimised for Data Warehousing, whereas Postgres core is optimised for
concurrent write workloads in production systems.

If you want a best-vs-best type of comparison, you should be looking at
a version of Postgres optimised for Data Warehousing. These results show
that Postgres-related options exist that clearly beat the above numbers.
http://community.greenplum.com/showthread.php?t=111
I note also that Greenplum's Single Node Edition is now free to use, so
is a reasonable product for comparison on this list.

Also, I'm unimpressed by a Data Warehouse database that requires
everything to reside in memory, e.g. MonetDB. That severely limits
real-world usability, in my experience because it implies the queries
you're running aren't ad-hoc.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Simon Riggs
On Mon, 2009-10-19 at 13:58 -0400, Tom Lane wrote:
> 
> Most read-only functions are stable or even immutable.

Huh? I mean a function that only contains SELECTs. (How would those ever
be Stable or Immutable??)

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Simon Riggs
On Mon, 2009-10-19 at 13:43 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Sat, 2009-10-17 at 21:41 -0400, Tom Lane wrote:
> >> one thing we'd have to consider
> >> is whether it is okay to suppress calculation of columns containing
> >> volatile functions.
> 
> > I think we should have a 4th class of functions,
> > volatile-without-side-effects (better name needed, obviously).
> 
> What for?  There wouldn't be that many, I think.  random() and
> clock_timestamp(), yeah, but most volatile user-defined functions
> are either volatile-with-side-effects or misdeclared.

Read only vs. read write?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Simon Riggs
On Sat, 2009-10-17 at 21:41 -0400, Tom Lane wrote:

> one thing we'd have to consider
> is whether it is okay to suppress calculation of columns containing
> volatile functions.

I think we should have a 4th class of functions,
volatile-without-side-effects (better name needed, obviously).

That would allow us to optimize such calls away, if appropriate.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Speeding up a query.

2009-07-07 Thread Simon Riggs

On Tue, 2009-06-16 at 14:35 -0400, Hartman, Matthew wrote:

> The algorithm for packing appointments in respects each constraint and
> typically schedules a day of treatments (30-60) within 9-10 seconds on
> my workstation, down from 27 seconds initially. I would like to get it
> below 5 seconds if possible.
> 
> I think what's slowing is down is simply the number of rows and joins.
> The algorithm creates a scheduling matrix with one row per 5 minute
> timeslot, per unit, per nurse assigned to the unit. That translates to
> 3,280 rows for the days I have designed in development (each day can
> change). 

ISTM the efficiency of your algorithm is geometrically related to the
number of time slots into which appointments might fit. So reduce number
of possible time slots...

Assign the slot (randomly/hash/round robin) to either the morning or the
afternoon and then run exactly same queries just with half number of
time slots. That should reduce your execution time by one quarter
without using multiple CPUs for each morning/afternoon. Then run twice,
once for morning, once for afternoon.

You could parallelise this and run both at same time on different CPUs,
if the extra work is worthwhile, but it seems not, judging from your
requirements.

Another way would be to arrange all appointments that need odd number of
timeslots into pairs so that you have at most one appointment that needs
an odd number of timeslots. Then schedule appointments on 10 minute
boundaries, rounding up their timeslot requirement. (The single odd
timeslot appointment will always waste 1 timeslot).

Hope that helps.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pointers needed on optimizing slow SQL statements

2009-06-06 Thread Simon Riggs

On Wed, 2009-06-03 at 21:21 -0400, Robert Haas wrote:

> But, we're not always real clever about selectivity.  Sometimes you
> have to fake the planner out, as discussed here.
> 
> http://archives.postgresql.org/pgsql-performance/2009-06/msg00023.php
> 
> Actually, I had to do this today on a production application.  In my
> case, the planner thought that a big OR clause was not very selective,
> so it figured it wouldn't have to scan very far through the outer side
> before it found enough rows to satisfy the LIMIT clause.  Therefore it
> materialized the inner side instead of hashing it, and when the
> selectivity estimate turned out to be wrong, it took 220 seconds to
> execute.  I added a fake join condition of the form a || b = a || b,
> where a and b were on different sides of the join, and now it hashes
> the inner side and takes < 100 ms.
> 
> Fortunately, these kinds of problems are fairly rare, but they can be
> extremely frustrating to debug.  With any kind of query debugging, the
> first question to ask yourself is "Are any of my selectivity estimates
> way off?".  If the answer to that question is no, you should then ask
> "Where is all the time going in this plan?".  If the answer to the
> first question is yes, though, your time is usually better spent
> fixing that problem, because once you do, the plan will most likely
> change to something a lot better.

The Function Index solution works, but it would be much better if we
could get the planner to remember certain selectivities.

I'm thinking a command like

ANALYZE foo [WHERE .... ]

which would specifically analyze the selectivity of the given WHERE
clause for use in queries.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-20 Thread Simon Riggs

On Wed, 2009-05-20 at 07:17 -0400, Robert Haas wrote:
> On Wed, May 20, 2009 at 4:11 AM, Simon Riggs  wrote:
> > The Hash node is fully executed before we start pulling rows through the
> > Hash Join node. So the Hash Join node will know at execution time
> > whether or not it will continue to maintain sorted order. So we put the
> > Sort node into the plan, then the Sort node can just ask the Hash Join
> > at execution time whether it should perform a sort or just pass rows
> > through (act as a no-op).
> 
> It's not actually a full sort.  For example if the join has two
> batches, you don't need to dump all of the tuples from both batches
> into a sort.  Each of the two tapes produced by the hash join is
> sorted, but if you read tape one and then tape two, of course then it
> won't be.  What you want to do is read the first tuple from each tape
> and return whichever one is smaller, and put the other one back; then
> lather, rinse, and repeat.  Because it's such a special-case
> computation, I think you're going to want to implement it within the
> HashJoin node rather than inserting a Sort node (or any other kind).

That has wider applicability and seems sound. It will also be easier to
assess a cost for that aspect in the optimizer. I like that approach.

Code wise, you'll need to refactor things quite a lot to make the
tuplesort code accessible to the HJ node. The sorting code is going to
get pretty hectic if we add in all the ideas for this, partial sort,
improved sorting (at least 3 other ideas). Perhaps it will be easier to
write a specific final merge routine just for HJs. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-20 Thread Simon Riggs

On Tue, 2009-05-19 at 23:54 -0400, Robert Haas wrote:

> I don't think it's a good idea to write off the idea of implementing
> this optimization at some point.  I see a lot of queries that join one
> fairly large table against a whole bunch of little tables, and then
> sorting the results by a column that is indexed in the big table.  

Agreed it's a common use case.

> The
> optimizer handles this by sequentially scanning the big table, hash
> joining against all of the little tables, and then sorting the output,
> which is pretty silly (given that all of the tables fit in RAM and are
> in fact actually cached there).  If there is a LIMIT clause, then it
> might instead index-scan the big table, do the hash joins, and then
> sort the already-ordered results.  This is better because at least
> we're not sorting the entire table unnecessarily but it's still poor.

The Hash node is fully executed before we start pulling rows through the
Hash Join node. So the Hash Join node will know at execution time
whether or not it will continue to maintain sorted order. So we put the
Sort node into the plan, then the Sort node can just ask the Hash Join
at execution time whether it should perform a sort or just pass rows
through (act as a no-op).

The cost of the Sort node can either be zero, or pro-rated down from the
normal cost based upon what we think the probability is of going
multi-batch, which would vary by work_mem available.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Tue, 2009-05-19 at 13:01 +0100, Matthew Wakeling wrote:

> That leads me on to another topic. Consider the query:
> 
> SELECT * FROM table ORDER BY a, b
> 
> where the column "a" is declared UNIQUE and has an index. Does Postgres 
> eliminate "b" from the ORDER BY, and therefore allow fetching without 
> sorting from the index?

No, because we don't use unique constraints much at all to infer things.

> Or how about this query:
> 
> SELECT * FROM table1, table2 WHERE table1.fk = table2.id ORDER BY
>  table1.id, table2.id
> 
> where both "id" columns are UNIQUE with an index. Do we eliminate 
> "table2.id" from the ORDER BY in this case?

Yes, that is eliminated via equivalence classes.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Tue, 2009-05-19 at 08:58 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > Both plans for this query show an IndexScan on a two column-index, with
> > an Index Condition of equality on the leading column. The ORDER BY
> > specifies a sort by the second index column, so the top-level Sort is
> > superfluous in this case.
> 
> > My understanding is that we don't currently eliminate superfluous
> > additional sorts of this kind.
> 
> Nonsense.  The planner might think some other plan is cheaper, but
> it definitely knows how to do this, and has since at least 8.1.

Please look at Dimitri's plan. If it can remove the pointless sort, why
does it not do so?

I agree that it will remove a Sort when the data is already has the
exact same interesting sort order. In this case the sort order is not
exactly the same, but looks fully removable to me.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Tue, 2009-05-19 at 14:00 +0200, Dimitri wrote:

> I may confirm the issue with hash join - it's repeating both with
> prepared and not prepared statements - it's curious because initially
> the response time is lowering near ~1ms (the lowest seen until now)
> and then once workload growing to 16 sessions it's jumping to 2.5ms,
> then with 32 sessions it's 18ms, etc..

Is it just bad all the time, or does it get worse over time?

Do you get the same behaviour as 32 sessions if you run 16 sessions for
twice as long?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Tue, 2009-05-19 at 12:36 +0100, Simon Riggs wrote:

> Partially sorted data takes much less effort to sort (OK, not zero, I
> grant) so this seems like a high complexity, lower value feature. I
> agree it should be on the TODO, just IMHO at a lower priority than some
> other features.

Perhaps its worth looking at a hybrid merge-join/hash-join that can cope
with data only mostly-sorted rather than fully sorted. That way we can
probably skip the partial sort altogether.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Tue, 2009-05-19 at 12:17 +0100, Matthew Wakeling wrote:
> Yes, Postgres has been missing the boat on this one for a while. +1 on
> requesting this feature.

That's an optimizer feature.

> Speaking of avoiding large sorts, I'd like to push again for partial 
> sorts. This is the situation where an index provides data sorted by
> column "a", and the query requests data sorted by "a, b". Currently,
> Postgres sorts the entire data set, whereas it need only group each
> set of identical "a" and sort each by "b".

This is an executor feature.

Partially sorted data takes much less effort to sort (OK, not zero, I
grant) so this seems like a high complexity, lower value feature. I
agree it should be on the TODO, just IMHO at a lower priority than some
other features.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Mon, 2009-05-18 at 19:00 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > In particular, running the tests repeatedly using 
> > H.REF_OBJECT = '01'
> > rather than varying the value seems likely to benefit MySQL.

One thing to note in terms of optimisation of this query is that we
perform a top-level sort at the end of the query.

Both plans for this query show an IndexScan on a two column-index, with
an Index Condition of equality on the leading column. The ORDER BY
specifies a sort by the second index column, so the top-level Sort is
superfluous in this case.

My understanding is that we don't currently eliminate superfluous
additional sorts of this kind. Now I know that is a hard subject, but it
seems straightforward to consider interesting sort order equivalence
when we have constant equality constraints.

My guess would be that MySQL does do the sort removal, in latest
version.

Dimitri's EXPLAIN ANALYZEs show differing costs for that additional
step, but the around 10% of query time looks shaveable.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Tue, 2009-05-19 at 00:33 +0200, Dimitri wrote:
> >
> > In particular, running the tests repeatedly using
> > H.REF_OBJECT = '01'
> > rather than varying the value seems likely to benefit MySQL. The
> 
> let me repeat again - the reference is *random*,
> the '01' value I've used just to show a query execution
> plan.
> 
> also, what is important - the random ID is chosen in way that no one
> user use the same to avoid deadlocks previously seen with PostgreSQL
> (see the "Deadlock mystery" note 2 years ago
> http://dimitrik.free.fr/db_STRESS_BMK_Part1.html#note_4355 )

OK, didn't pick up on that.

(Like Tom, I was thinking query cache)

Can you comment on the distribution of values for that column? If you
are picking randomly, this implies distribution is uniform and so I am
surprised we are mis-estimating the selectivity.

> I think yes (but of course I did not try to replay it several times)

If you could that would be appreciated. We don't want to go chasing
after something that is not repeatable.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-18 Thread Simon Riggs

On Mon, 2009-05-18 at 20:00 +0200, Dimitri wrote:

> >From my point of view it needs first to understand where the time is
> wasted on a single query (even when the statement is prepared it runs
> still slower comparing to MySQL).

There is still a significant number of things to say about these numbers
and much tuning still to do, so I'm still confident of improving those
numbers if we needed to.

In particular, running the tests repeatedly using 
H.REF_OBJECT = '01'
rather than varying the value seems likely to benefit MySQL. The
distribution of values is clearly non-linear; while Postgres picks a
strange plan for that particular value, I would guess there are also
values for which the MySQL plan is sub-optimal. Depending upon the
distribution of selected data we might see the results go either way.

What I find worrying is your result of a scalability wall for hash
joins. Is that a repeatable issue?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-18 Thread Simon Riggs

On Wed, 2009-05-13 at 23:23 +0200, Dimitri Fontaine wrote:

> As I think I need this solution too, I've coded a PG module to
> scratch  
> that itch this morning, and just published it (BSD licenced) on  
> pgfoundry:
>http://preprepare.projects.postgresql.org/README.html
>http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/preprepare/preprepare/

Looks very cool Dimitri

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-18 Thread Simon Riggs

On Thu, 2009-05-14 at 20:25 +0200, Dimitri wrote:

> # lwlock_wait_8.4.d `pgrep -n postgres`

>Lock IdMode   Combined Time (ns)
>   FirstLockMgrLock   Exclusive 803700
>BufFreelistLock   Exclusive 3001600
>   FirstLockMgrLock  Shared   4586600
>  FirstBufMappingLock   Exclusive  6283900
>  FirstBufMappingLock  Shared 21792900

I've published two patches to -Hackers to see if we can improve the read
only numbers on 32+ cores.

Try shared_buffer_partitions = 256

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-14 Thread Simon Riggs

On Tue, 2009-05-12 at 14:28 +0200, Dimitri wrote:

> As problem I'm considering a scalability issue on Read-Only workload -
> only selects, no disk access, and if on move from 8 to 16 cores we
> gain near 100%, on move from 16 to 32 cores it's only 10%...

Dimitri,

Will you be re-running the Read-Only tests?

Can you run the Dtrace script to assess LWlock contention during the
run?

Would you re-run the tests with a patch?

Thanks,

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 15:52 -0400, Robert Haas wrote:
> On Tue, May 12, 2009 at 12:49 PM, Tom Lane  wrote:
> > 1. There is no (portable) way to pass the connection from the postmaster
> > to another pre-existing process.
> 
> [Googles.]  It's not obvious to me that SCM_RIGHTS is non-portable,
> and Windows has an API call WSADuplicateSocket() specifically for this
> purpose.

Robert, Greg,

Tom's main point is it isn't worth doing. We have connection pooling
software that works well, very well. Why do we want to bring it into
core? (Think of the bugs we'd hit...) If we did, who would care?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 11:30 -0700, Scott Carey wrote:
> the fact is there is no evidence that a connection pooler will fix the
> scalability from 16 > 32 cores.

There has been much analysis over a number of years of the effects of
the ProcArrayLock, specifically the O(N^2) effect of increasing numbers
of connections on GetSnapshotData(). Most discussion has been on
-hackers, not -perform.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 16:00 +0100, Matthew Wakeling wrote:
> won't connect operations be all handled by a 
> single thread - the parent postmaster?

No, we spawn then authenticate. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 12:19 +0200, Dimitri wrote:

> What I'm trying to do now is to understand what exactly is the
> problem.

You're running with 1600 users, which is above the scalability limit
uncovered (by Sun...) during earlier benchmarking. The scalability
issues are understood but currently considered above the
reasonable-setting limit and so nobody has been inclined to improve
matters.

You should use a connection concentrator to reduce the number of
sessions down to say 400.

You're WAL buffers setting is also too low and you will be experiencing
contention on the WALWriteLock. Increase wal_buffers to about x8 where
you have it now.

You can move pg_xlog to its own set of drives.

Set checkpoint_completion_target to 0.95.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 13:16 +0200, Dimitri wrote:

> Wow, Simon! :-))
> 
> yes, I'm working in Sun Benchmark Center :-))
> (I'm not using my Sun email on public lists only to avid a spam)
> 
> and as came here and asking questions it's probably proving my
> intentions to show PostgreSQL in its best light, no?.. - I never liked
> "not honest" comparisons :-))
> 
> Regarding your bet: from a very young age I learned a one thing - you
> take any 2 person who betting for any reason - you'll find in them one
> idiot and one bastard :-))   idiot - because betting while missing
> knowledge, and bastard - because knowing the truth is not honset to
> get a profit from idiots :-))  That's why I never betting in my life,
> but every time telling the same story in such situation... Did you
> like it? ;-))

No, but I asked for it, so we're even. ;-)

Let's work on the benchmark.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 12:19 +0200, Dimitri wrote:

> For my big surprise, MySQL was faster!

Ours too.

** I bet you $1000 that I can improve the performance of your benchmark
results with PostgreSQL. You give me $1000 up-front and if I can't
improve your high end numbers I'll give you $2000 back. Either way, you
name me and link to me from your blog. Assuming you re-run the tests as
requested and give me reasonable access to info and measurements. **

I note your blog identifies you as a Sun employee. Is that correct? If
you do not give us the opportunity to improve upon the results then
reasonable observers might be persuaded you did not wish to show
PostgreSQL in its best light. You up for it?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-11 Thread Simon Riggs

On Mon, 2009-05-11 at 11:23 -0400, Tom Lane wrote:
> Dimitri  writes:
> > Anyone may explain me why analyze target may have so huge negative
> > secondary effect?..
> 
> If these are simple queries, maybe what you're looking at is the
> increase in planning time caused by having to process 10x as much
> statistical data.  Cranking statistics_target to the max just because
> you can is not necessarily a good strategy.

statistics_target effects tables, so we have problems if you have a mix
of simple and complex queries. IMHO we need an explicit planner_effort
control, rather than the more arcane *_limit knobs which are effectively
the same thing, just harder to use in practice.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-11 Thread Simon Riggs

On Mon, 2009-05-11 at 17:18 +0200, Dimitri wrote:

> Yes, forget, MySQL is reaching 17.500 TPS here.

Please share your measurements of MySQL scalability also.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Simon Riggs

On Thu, 2009-05-07 at 20:36 +0200, Dimitri wrote:

> I've simply restarted a full test with hashjoin OFF. Until 32
> concurrent users things are going well. Then since 32 users response
> time is jumping to 20ms, with 64 users it's higher again, and with 256
> users reaching 700ms, so TPS is dropping from 5.000 to ~200..
> 
> With hashjoin ON it's not happening, and I'm reaching at least 11.000
> TPS on fully busy 32 cores.

Much better to stick to the defaults. 

Sounds like a problem worth investigating further, but not pro bono.

> About scalability issue - there is one on 8.3.7, because on 32 cores
> with such kind of load it's using only 50% CPU and not outpassing
> 6.000 TPS, while 8.4 uses 90% CPU and reaching 11.000 TPS..

Yeh, small changes make a big difference. Thanks for the info.

How does MySQL perform?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Simon Riggs

On Thu, 2009-05-07 at 12:58 +0100, Gregory Stark wrote:

> It would be nice if we were in the same ballpark as MySQL but we would only be
> interesting in such optimizations if they don't come at the expense of
> scalability under more complex workloads.

It doesn't appear there is a scalability issue here at all.

Postgres can clearly do the same query in about the same time.

We just have a case where MySQL happens to optimise it well and Postgres
doesn't. Since we can trivially design cases that show the opposite I'm
not worried too much. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-07 Thread Simon Riggs

On Thu, 2009-05-07 at 10:56 +0800, Craig Ringer wrote:
> Tom Lane wrote:
> > Alvaro Herrera  writes:
> >> I think there should be a way to refer to individual partitions as
> >> objects.
> > 
> > Yeah, the individual partitions should be nameable tables, otherwise we
> > will be reinventing a *whole* lot of management stuff to little gain.
> > I don't actually think there is anything wrong with using table
> > inheritance as the basic infrastructure --- I just want more smarts
> > about one particular use pattern of inheritance.
> 
> Maybe it's worth examining and documenting existing partition setups,
> the reasoning behind them, and how they're implemented, in order to
> guide any future plans for native partitioning support?
> 
> Maybe that's already been/being done. On the off chance that it's not:
> 
> Ones I can think of:
> 
> - Partitioning an equally active dataset by ranges over a key to improve
>  scan performance, INSERT/UPDATE costs on indexes, locking issues, etc.
> 
> - The "classic" active/archive partition scheme where there's only one
> partition growing at any one time, and the others are historical data
> that's nowhere near as "hot".
> 
> - A variant on the basic active/archive structure, where query activity
> decreases slowly over time and there are many partitions of recent data.
> Partitions are merged into larger ones as they age, somewhat like a RRD
> database.
> 
> I also expect that in the future there will be demand for striping data
> across multiple partitions in different tablespaces to exploit
> in-parallel scanning (when/if supported) for better I/O utilization in
> multiple-disk-array situations. For example, partitioning on
> "MOD(id,10)" across 10 separate volumes, and firing off 10 concurrent
> scans, one per partition, to satisfy a query.

That's a good summary. It has already been documented and discussed, but
saying it again and again is the best way to get this across.

You've highlighted that partitioning is a feature with many underlying
requirements: infrequent access to data (frequently historical),
striping for parallelism and getting around RDBMS flaws (if any). We
must be careful to implement each requirement in full, yet separately,
so we don't end up with 60% functionality in each case by delivering an
average or least common denominator solution.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Simon Riggs

On Wed, 2009-05-06 at 17:55 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote:
> >> The problem has been finding someone who has both the time and the
> >> ability to do the work.
> 
> > Unfortunately there has been significant debate over which parts of
> > partitioning need to be improved. My own view is that considerable
> > attention needs to be applied to both the executor and planner to
> > improve matters and that syntax improvements are largely irrelevant,
> > though seductive.
> 
> My thought about it is that what we really need is an explicit notion
> of partitioned tables built into the system, instead of trying to make
> the planner re-deduce the partitioning behavior from first principles
> every time it builds a plan for such a table.  Such a notion would
> presumably involve some new syntax to allow the partitioning rule to be
> specified at table creation time.  I agree that the syntax details are a
> minor issue, but the set of possible partitioning rules is certainly a
> topic of great interest.

Agreed. Perhaps I should say then that the syntax needs to express the
requirements of the planner/executor behaviour, rather than being the
main aspect of the feature, as some have suggested.

Hopefully, notions of partitioning won't be directly tied to chunking of
data for parallel query access. Most queries access recent data and
hence only a single partition (or stripe), so partitioning and
parallelism and frequently exactly orthogonal. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Simon Riggs

On Wed, 2009-05-06 at 10:31 +0200, Dimitri wrote:

> I've already tried a target 1000 and the only thing it changes
> comparing to the current 100 (default) is instead of 2404 rows it says
> 240 rows, but the plan remaining the same..

Try both of these things
* REINDEX on the index being used in the query, then re-EXPLAIN
* enable_hashjoin = off, then re-EXPLAIN

You should first attempt to get the same plan, then confirm it really is
faster before we worry why the optimizer hadn't picked that plan. 

We already know that MySQL favors nested loop joins, so turning up a
plan that on this occasion is actually better that way is in no way
representative of general performance. Does MySQL support hash joins?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Simon Riggs

On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote:

> The problem has been finding someone who has both the time and the
> ability to do the work.

Unfortunately there has been significant debate over which parts of
partitioning need to be improved. My own view is that considerable
attention needs to be applied to both the executor and planner to
improve matters and that syntax improvements are largely irrelevant,
though seductive.

Deep improvements will require significant analysis, agreement, effort
and skill. What we have now took approximately 20 days to implement,
with later patches adding about another 10-20 days work. I'd estimate
the required work as 60-100 days work from primary author, plus planning
and discussion time. YMMV.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Simon Riggs

On Mon, 2009-04-20 at 14:53 -0700, da...@lang.hm wrote:

> the big win is going to be in changing the core of rsyslog so that it can 
> process multiple messages at a time (bundling them into a single 
> transaction)

That isn't necessarily true as a single "big win".

The reason there is an overhead per transaction is because of commit
delays, which can be removed by executing

  SET synchronous_commit = off; 

after connecting to PostgreSQL 8.3+

You won't need to do much else. This can also be enabled for a
PostgreSQL user without even changing the rsyslog source code, so it
should be easy enough to test.

And this type of application is *exactly* what it was designed for.

Some other speedups should also be possible, but this is easiest. 

I would guess that batching inserts will be a bigger win than simply
using prepared statements because it will reduce network roundtrips to a
centralised log server. Preparing statements might show up well on tests
because people will do tests against a local database, most likely.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] error updating a very large table

2009-04-15 Thread Simon Riggs

On Wed, 2009-04-15 at 09:51 -0400, Tom Lane wrote:
> Brian Cox  writes:
> > I changed the logic to update the table in 1M row batches. However, 
> > after 159M rows, I get:
> 
> > ERROR:  could not extend relation 1663/16385/19505: wrote only 4096 of 
> > 8192 bytes at block 7621407
> 
> You're out of disk space.
> 
> > A df run on this machine shows plenty of space:
> 
> Per-user quota restriction, perhaps?
> 
> I'm also wondering about temporary files, although I suppose 100G worth
> of temp files is a bit much for this query.  But you need to watch df
> while the query is happening, rather than suppose that an after-the-fact
> reading means anything.

Anytime we get an out of space error we will be in the same situation.

When we get this error, we should
* summary of current temp file usage
* df (if possible on OS)

Otherwise we'll always be wondering what caused the error.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Question on pgbench output

2009-04-05 Thread Simon Riggs

On Fri, 2009-04-03 at 16:34 -0700, David Kerr wrote:
> 400 concurrent users doesn't mean that they're pulling 1.5 megs /
> second every second. Just that they could potentially pull 1.5 megs at
> any one second. most likely there is a 6 (minimum) to 45 second
> (average) gap  between each individual user's pull.

There's a world of difference between 400 connected and 400 concurrent
users. You've been testing 400 concurrent users, yet without measuring
data transfer. The think time will bring the number of users right down
again, but you really need to include the much higher than normal data
transfer into your measurements and pgbench won't help there.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] plpgsql arrays

2009-04-03 Thread Simon Riggs

On Fri, 2009-04-03 at 10:04 -0400, Tom Lane wrote:
> Matthew Wakeling  writes:
> > On Fri, 3 Apr 2009, Robert Haas wrote:
> >> Why not just use SQL to do the join?
> 
> > Because the merge condition is:
> 
> > WHERE l1.start <= l2.end AND l2.start <= l1.end
> 
> > and merge joins in postgres only currently cope with the case where the 
> > merge condition is an equals relationship.

(snip)

> I don't actually believe that a standard merge join algorithm will work
> with an intransitive join condition ...

I think it's a common enough problem that having a non-standard join
algorithm written for that case would be interesting indeed.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-21 Thread Simon Riggs

On Fri, 2009-03-20 at 15:28 +, Matthew Wakeling wrote:
> On Thu, 19 Mar 2009, Scott Carey wrote:
> > In type B, the ratio of requests that must context switch is always == 
> > 1.  Every request must queue and wait!
> 
> A remarkably good point, although not completely correct. Every request 
> that arrives when the lock is held in any way already will queue and wait. 
> Requests that arrive when the lock is free will run immediately. I admit 
> it, this is a killer for this particular locking strategy.

I think the right mix of theory and test here is for people to come up
with new strategies that seem to make sense and then we'll test them
all. Trying too hard to arrive at the best strategy purely through
discussion will mean we miss a few tricks. Feels like we're on the right
track here.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Wed, 2009-03-18 at 13:49 +, Matthew Wakeling wrote:
> On Wed, 18 Mar 2009, Jignesh K. Shah wrote:
> > I thought about that.. Except without putting a restriction a huge queue 
> > will cause lot of time spent in manipulating the lock
> > list every time. One more thing will be to maintain two list shared and 
> > exclusive and round robin through them for every time you
> > access the list so manipulation is low.. But the best thing is to allow 
> > flexibility to change the algorithm since some workloads
> > may work fine with one and others will NOT. The flexibility then allows to 
> > tinker for those already reaching the limits.
> 
> Yeah, having two separate queues is the obvious way of doing this. It 
> would make most operations really trivial. Just wake everything in the 
> shared queue at once, and you can throw it away wholesale and allocate a 
> new queue. It avoids a whole lot of queue manipulation.

Yes, that sounds good.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Wed, 2009-03-18 at 16:26 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > On Mon, 2009-03-16 at 16:26 +, Matthew Wakeling wrote:
> >> One possibility would be for the locks to alternate between exclusive
> >> and 
> >> shared - that is:
> >> 
> >> 1. Take a snapshot of all shared waits, and grant them all -
> >> thundering
> >> herd style.
> >> 2. Wait until ALL of them have finished, granting no more.
> >> 3. Take a snapshot of all exclusive waits, and grant them all, one by
> >> one.
> >> 4. Wait until all of them have been finished, granting no more.
> >> 5. Back to (1)
> 
> > I agree with that, apart from the "granting no more" bit.
> 
> > Currently we queue up exclusive locks, but there is no need to since for
> > ProcArrayLock commits are all changing different data.
> 
> > The most useful behaviour is just to have two modes:
> > * exclusive-lock held - all other x locks welcome, s locks queue
> > * shared-lock held - all other s locks welcome, x locks queue
> 
> My goodness, it seems people have forgotten about the "lightweight"
> part of the LWLock design.

"Lightweight" is only useful if it fits purpose. If the LWlock design
doesn't fit all cases, especially with critical lock types, then we can
have special cases. We have both spinlocks and LWlocks, plus we split
hash tables into multiple lock partitions. If we have 3 types of
lightweight locking, why not consider having 4?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Wed, 2009-03-18 at 11:45 +, Matthew Wakeling wrote:
> On Wed, 18 Mar 2009, Simon Riggs wrote:
> > I agree with that, apart from the "granting no more" bit.
> >
> > The most useful behaviour is just to have two modes:
> > * exclusive-lock held - all other x locks welcome, s locks queue
> > * shared-lock held - all other s locks welcome, x locks queue
> 
> The problem with making all other locks welcome is that there is a 
> possibility of starvation. Imagine a case where there is a constant stream 
> of shared locks - the exclusive locks may never actually get hold of the 
> lock under the "all other shared locks welcome" strategy. 

That's exactly what happens now. 

> Likewise with the reverse.

I think it depends upon how frequently requests arrive. Commits cause X
locks and we don't commit that often, so its very unlikely that we'd see
a constant stream of X locks and prevent shared lockers.


Some comments from an earlier post on this topic (about 20 months ago):

Since shared locks are currently queued behind exclusive requests
when they cannot be immediately satisfied, it might be worth
reconsidering the way LWLockRelease works also. When we wake up the
queue we only wake the Shared requests that are adjacent to the head of
the queue. Instead we could wake *all* waiting Shared requestors.

e.g. with a lock queue like this:
(HEAD)  S<-S<-X<-S<-X<-S<-X<-S
Currently we would wake the 1st and 2nd waiters only. 

If we were to wake the 3rd, 5th and 7th waiters also, then the queue
would reduce in length very quickly, if we assume generally uniform
service times. (If the head of the queue is X, then we wake only that
one process and I'm not proposing we change that). That would mean queue
jumping right? Well thats what already happens in other circumstances,
so there cannot be anything intrinsically wrong with allowing it, the
only question is: would it help? 

We need not wake the whole queue, there may be some generally more
beneficial heuristic. The reason for considering this is not to speed up
Shared requests but to reduce the queue length and thus the waiting time
for the Xclusive requestors. Each time a Shared request is dequeued, we
effectively re-enable queue jumping, so a Shared request arriving during
that point will actually jump ahead of Shared requests that were unlucky
enough to arrive while an Exclusive lock was held. Worse than that, the
new incoming Shared requests exacerbate the starvation, so the more
non-adjacent groups of Shared lock requests there are in the queue, the
worse the starvation of the exclusive requestors becomes. We are
effectively randomly starving some shared locks as well as exclusive
locks in the current scheme, based upon the state of the lock when they
make their request. The situation is worst when the lock is heavily
contended and the workload has a 50/50 mix of shared/exclusive requests,
e.g. serializable transactions or transactions with lots of
subtransactions.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Mon, 2009-03-16 at 16:26 +, Matthew Wakeling wrote:
> One possibility would be for the locks to alternate between exclusive
> and 
> shared - that is:
> 
> 1. Take a snapshot of all shared waits, and grant them all -
> thundering
>  herd style.
> 2. Wait until ALL of them have finished, granting no more.
> 3. Take a snapshot of all exclusive waits, and grant them all, one by
> one.
> 4. Wait until all of them have been finished, granting no more.
> 5. Back to (1)

I agree with that, apart from the "granting no more" bit.

Currently we queue up exclusive locks, but there is no need to since for
ProcArrayLock commits are all changing different data.

The most useful behaviour is just to have two modes:
* exclusive-lock held - all other x locks welcome, s locks queue
* shared-lock held - all other s locks welcome, x locks queue

This *only* works for ProcArrayLock.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Sat, 2009-03-14 at 12:09 -0400, Tom Lane wrote:
> Heikki Linnakangas  writes:
> > WALInsertLock is also quite high on Jignesh's list. That I've seen 
> > become the bottleneck on other tests too.
> 
> Yeah, that's been seen to be an issue before.  I had the germ of an idea
> about how to fix that:
> 
>   ... with no lock, determine size of WAL record ...
>   obtain WALInsertLock
>   identify WAL start address of my record, advance insert pointer
>   past record end
>   *release* WALInsertLock
>   without lock, copy record into the space just reserved
> 
> The idea here is to allow parallelization of the copying of data into
> the buffers.  The hold time on WALInsertLock would be very short.  Maybe
> it could even become a spinlock, though I'm not sure, because the
> "advance insert pointer" bit is more complicated than it looks (you have
> to allow for the extra overhead when crossing a WAL page boundary).
> 
> Now the fly in the ointment is that there would need to be some way to
> ensure that we didn't write data out to disk until it was valid; in
> particular how do we implement a request to flush WAL up to a particular
> LSN value, when maybe some of the records before that haven't been fully
> transferred into the buffers yet?  The best idea I've thought of so far
> is shared/exclusive locks on the individual WAL buffer pages, with the
> rather unusual behavior that writers of the page would take shared lock
> and only the reader (he who has to dump to disk) would take exclusive
> lock.  But maybe there's a better way.  Currently I don't believe that
> dumping a WAL buffer (WALWriteLock) blocks insertion of new WAL data,
> and it would be nice to preserve that property.

Yeh, that's just what we'd discussed previously:
http://markmail.org/message/gectqy3yzvjs2hru#query:Reworking%20WAL%
20locking+page:1+mid:gectqy3yzvjs2hru+state:results

Are you thinking of doing this for 8.4? :-)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-17 Thread Simon Riggs

On Tue, 2009-03-17 at 19:54 -0400, Jignesh K. Shah wrote:
> 
> Simon Riggs wrote:
> > On Tue, 2009-03-17 at 17:41 -0400, Jignesh K. Shah wrote:
> >
> >   
> >> I did a quick test with patch. Unfortunately it improves my number
> >> even with default setting 0 (not sure whether I should be pleased or
> >> sad - Definitely no overhead infact seems to help performance a bit.
> >> NOTE: Logic is same, implementation is slightly different for default
> >> set)
> >> 
> >
> > OK, I bite. 25% gain from doing nothing??? You're stretching my... err,
> > credulity.
> >
> > I like the train of thought for setting 1 and it is worth investigating,
> > but something feels wrong somewhere.
> >
> >   
> Actually I think I am hurting my credibility here  since I cannot 
> explain the improvement with the patch but still using default logic 
> (thought different way I compare sequential using fields from the 
> previous proc structure  instead of comparing with constant boolean)  
> But the change was necessary to allow it to handle multiple algorithms 
> and yet be sleek and not bloated.
> 
>  In next couple of weeks I plan to test the patch on a different x64 
> based system to do a sanity testing on lower number of cores and also 
> try out other workloads ...

Good plan. I'm behind your ideas and will be happy to wait.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-17 Thread Simon Riggs

On Tue, 2009-03-17 at 17:41 -0400, Jignesh K. Shah wrote:

> I did a quick test with patch. Unfortunately it improves my number
> even with default setting 0 (not sure whether I should be pleased or
> sad - Definitely no overhead infact seems to help performance a bit.
> NOTE: Logic is same, implementation is slightly different for default
> set)

OK, I bite. 25% gain from doing nothing??? You're stretching my... err,
credulity.

I like the train of thought for setting 1 and it is worth investigating,
but something feels wrong somewhere.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-17 Thread Simon Riggs

On Wed, 2009-03-11 at 22:20 -0400, Jignesh K. Shah wrote:

> A tunable does not impact existing behavior

Why not put the tunable parameter into the patch and then show the test
results with it in? If there is no overhead, we should then be able to
see that.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-14 Thread Simon Riggs

On Wed, 2009-03-11 at 16:53 -0400, Jignesh K. Shah wrote:

> 1200: 2000: Medium Throughput: -1781969.000 Avg Medium Resp: 0.019

I think you need to iron out bugs in your test script before we put too
much stock into the results generated. Your throughput should not be
negative.

I'd be interested in knowing the number of S and X locks requested, so
we can think about this from first principles. My understanding is that
ratio of S:X is about 10:1. Do you have more exact numbers?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] caching written values?

2009-01-22 Thread Simon Riggs

On Thu, 2009-01-22 at 13:11 +0100, Thomas Finneid wrote:

> Is there any possibilites of telling pg to save to disk that memory 
> cached data and state when the server is shutdown, so that when the 
> server starts up again, itreads it back into the memory?

It's possible, but not by any directly supported mechanism.

You have to consider whether the data you saved would still be required
when the server restarts.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] understanding postgres issues/bottlenecks

2009-01-07 Thread Simon Waters
On Wednesday 07 January 2009 04:17:10 M. Edward (Ed) Borasky wrote:
> 
> 1. The package it lives in is called "sysstat". Most Linux distros do
> *not* install "sysstat" by default. Somebody should beat up on them
> about that. :)

Hehe, although sysstat and friends did have issues on Linux for a long time. 
Nothing worse than misleading stats, so I suspect it lost a lot of friends 
back then. It is a lot better these days when most of the "Unix" software 
targets Linux first, and other kernels second.

Aside from all the advice here about system tuning, as a system admin I'd also 
ask is the box doing the job you need? And are you looking at the Postgres 
log (with logging of slow queries) to see that queries perform in a sensible 
time? I'd assume with the current performance figure there is an issue 
somewhere, but I've been to places where it was as simple as adding one 
index, or even modifying an index so it does what the application developer 
intended instead of what they ask for ;)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5

2008-12-24 Thread Simon Riggs
Hi Mark,

Good to see you producing results again.


On Sat, 2008-12-20 at 16:54 -0800, Mark Wong wrote:
> Here are links to how the throughput changes when increasing shared_buffers:
> 
> http://pugs.postgresql.org/node/505

Only starnge thing here is the result at 22528MB. It's the only normal
one there. Seems to be a freeze occurring on most tests around the 30
minute mark, which delays many backends and reduces writes. 

Reduction in performance as shared_buffers increases looks normal.

Increase wal_buffers, but look for something else as well. Try to get a
backtrace from when the lock up happens. It may not be Postgres?

> And another series of tests to show how throughput changes when
> checkpoint_segments are increased:
> 
> http://pugs.postgresql.org/node/503
> 
> The links go to a graphical summary and raw data.  Note that the
> maximum theoretical throughput at this scale factor is approximately
> 12000 notpm.
> 
> My first glance takes tells me that the system performance is quite
> erratic when increasing the shared_buffers.  I'm also not what to
> gather from increasing the checkpoint_segments.  Is it simply that the
> more checkpoint segments you have, the more time the database spends
> fsyncing when at a checkpoint?

I would ignore the checkpoint_segment tests because you aren't using a
realistic value of shared_buffers. I doubt any such effect is noticeable
when you use a realistic value determined from set of tests 505.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Simon Waters
On Tuesday 09 December 2008 13:08:14 Jean-David Beyer wrote:
> 
> and even if they can, I do not know if postgres uses that ability. I doubt
> it, since I believe (at least in Linux) a process can do that only if run
> as root, which I imagine few (if any) users do.

Disclaimer: I'm not a system programmer... 

I believe that at Linux kernel revision 2.6.8 and before processes need Posix 
capability CAP_IPC_LOCK, and 2.6.9 and after they need CAP_IPC_LOCK to lock 
more than RLIMIT_MEMLOCK.

It is a capability, so a process can run as any user assuming it is started 
with or gained the capability.

No idea if Postgres uses any of this, other than to protect security of 
certain password operations there is probably not much point. If key parts of 
your database are being paged out, get more RAM, if idle parts of your 
database are paged out, you probably could more usefully apply that RAM for 
something else.

The Varnish cache design is the place to look for enlightenment on relying on 
the kernel paging (using memory mapped files) rather than trying to do it 
yourself, but then a proxy server is a lot simpler than a RDBMS. That said, 
Varnish is fast at what it does (reverse HTTP proxy) !

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perc 3 DC

2008-11-24 Thread Simon Waters
On Monday 24 November 2008 14:49:17 Glyn Astill wrote:
> --- On Mon, 24/11/08, Steve Clark <[EMAIL PROTECTED]> wrote:
> > > Yeah the battery's on it, that and the 128Mb is
> >
> > really the only reason I thought I'd give it a whirl.
> >
> >
> > Is the battery  functioning? We found that the unit had to
> > be on and charged before write back caching
> > would work.
>
> Yeah the battery is on there, and in the BIOS it says it's "PRESENT" and
> the status is "GOOD".

Sorry I deleted the beginning of this on getting back from a week off.

Writeback is configurable. You can enabled write back caching when the unit is 
not charged if you like. It is offered when you create the array (and can be 
changed later). It is arguably a silly thing to do, but it is an option.

I have some reasonable performance stats for this card assuming you have a 
suitably recent version of the driver software, DELL use to ship with a Linux 
kernel that had a broken driver for this card resulting is very poor 
performance (i.e. substantially slower than software RAID). I have a note 
never to use with Linux before 2.6.22 as the LSI driver bundled had issues, 
DELL themselves shipped (if you asked "why is performance so bad") a Redhat 
kernel with a later driver for the card than the official Linux kernel.

That said a couple of weeks back ours corrupted a volume on replacing a dead 
hard disk, so I'm never touching these cheap and tacky LSI RAID cards ever 
again. It is suppose to just start rebuilding the array when you insert the 
replacement drive, if it doesn't "just work" schedule some down time and 
figure out exactly why, don't (for example) blindly follow the instructions 
in the manual on what to do if it doesn't "just work".

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Backup strategies

2008-10-17 Thread Simon Riggs

On Wed, 2008-10-15 at 16:05 +0200, Ivan Voras wrote:

> So, pg_start_backup() freezes the data at the time it's called but
> still
> data and xlog are changed, in a different way that's safe to backup?

No, that's not how it works. The pg_start_backup() records the point
that we must rollforward from. There is no freezing.

>  Why
> not run with pg_start_backup() always enabled?

It's not a mode that can be enabled/disabled. Its a starting point.

You should run pg_start_backup() each time you run a backup, just like
the fine manual describes.

Check your backups...

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [GENERAL] [PERFORM] Drupal and PostgreSQL - performance issues?

2008-10-13 Thread Simon Waters
On Monday 13 October 2008 15:19:07 Scott Marlowe wrote:
> 
> > shared_buffers = 24MB
> > max_fsm_pages = 153600
>
> Well, 24MB is pretty small.  See if you can increase your system's
> shared memory and postgresql's shared_buffers to somewhere around 256M
> to 512M.  It likely won't make a big difference in this scenario, but
> overall it will definitely help.

I noted after reading earlier messages in the thread, that my distro documents 
that the values it default to for shared_buffers is rather small.

One of our servers is fairly pressed for memory (some of the time). Is there 
any way to measure the amount of churn in the shared_buffers, as a way of 
demonstrating that more is needed (or at this moment more would help)?

A few very small databases on this server, and one which is 768M (still pretty 
small but a lot bigger than the rest, most of which is logging information). 
The only "hot" information is the session table, ~9000 lines, one index on 
the session id. Can I ask Postgres to tell me, or estimate, how much memory 
this table would occupy if fully cached in memory?

Half the problem in modern computing is knowing what is "slow". In this case, 
counting the rows of the session table takes about 100ms. Deleting expired 
session rows about 120ms, more if it hasn't done it for a while, which is I 
guess evidence that table isn't being cached in memory as efficiency as it 
could be.

In this case the server thinks the system I/O is zero for half the tools in 
use, because of the RAID hardware, so most of the Linux based tools are 
useless in this context.

At the risk of thread hijacking, for the session table I wonder if we are 
handling it the most efficient way. It is just a regular table, indexed on 
session_id. Each request of note to the server requires retrieval of the 
session record, and often updating the expiry information. Every N requests 
the application also issues a:

DELETE FROM sessions WHERE expireshttp://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] 7.4 - basic tuning question

2008-10-03 Thread Simon Waters
Hi,

we have a log table on one server with 1.9 million records.

One column "event" (type text) in that table is a string that (currently) 
takes a small number of distinct values (~43) (hmm that could have been 
normalised better).

We noted on querying for events of a specific type, that the queries were 
slower than expected. It simply wasn't using the index (btree, default 
settings) on this column on this server (the test server, with less records, 
was fine).

Using "ALTER TABLE SET STATISTICS" to increase the number of buckets to 50 
resolved the issue, we went pretty much straight there on discovering there 
are no "HINTS".

However we aren't quite sure why this case was pathological, and my brain 
doesn't grok the documentation quite.

I assume that the histogram_bounds for strings are alphabetical in order, so 
that "DEMOSTART" falls between "DELETE" and "IDEMAIL". Even on a worst case 
of including both these common values, the planner ought to have assumed that 
less than <10% of records were likely covered by the value selected, so it 
seems unlikely to me that not using the index would be a good idea.

What am I missing? (and yes there is a plan to upgrade!).


=> SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
(...lots of time passes...)
 count
---
  1432
(1 row)


=> SELECT COUNT(*), event FROM log GROUP BY event ORDER BY count;

  count |   event
+---
  6 | DNRFAIL
 14 | ADMDNR
 14 | UPGRADE
 18 | FOCRENEW
 21 | AUTOCN
 25 | ADMCC
 27 | TEMPIN
 31 | DNRCANCEL
 43 | EXPIRED
128 | DIRECTBUY
130 | CANCEL
130 | CANCELQ
154 | FOCBUY
173 | EXPCCWARN
179 | OFFER
209 | DNROK
214 | TEMPRE
356 | CCWARN
429 | ADMLOGIN
719 | SUBSCRIBE
787 | CCSUCCESS
988 | CCFAILURE
   1217 | TEMPNEW
   1298 | PAYPAL
   1431 | DEMOSTART
   1776 | CCREQUEST
   2474 | ACCTUPD
  15169 | SYSMAINT
  42251 | IDEMAIL
  46964 | DELETE
  50764 | RELOGIN
  57022 | NEWUSR
  64907 | PUBREC0
  65449 | UNPUBLISH
  92843 | LOGOUT
  99018 | KILLSESS
 128900 | UPLOAD
 134994 | LOGIN
 137608 | NEWPAGE
 447556 | PUBREC1
 489572 | PUBLISH


=> EXPLAIN SELECT * FROM log WHERE event='DEMOSTART';
 QUERY PLAN

 Seq Scan on log  (cost=0.00..54317.14 rows=20436 width=93)
   Filter: (event = 'DEMOSTART'::text)
(2 rows)


=> ALTER TABLE log ALTER COLUMN events SET STATISTICS 50; ANALYSE
LOG(event);
ALTER TABLE
ANALYZE


=> EXPLAIN SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
QUERY PLAN

---
 Aggregate  (cost=5101.43..5101.43 rows=1 width=0)
   ->  Index Scan using log_event on log  (cost=0.00..5098.15 rows=1310
width=0)
 Index Cond: (event = 'DEMOSTART'::text)
(3 rows)


=> SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
(...almost no time passes...)
 count
---
  1432
(1 row)


BEFORE
pajax=> select * from pg_stats where tablename = 'log' and attname='event';
 schemaname | tablename | attname | null_frac | avg_width | n_distinct |

most_common_vals| 
most_common_freqs | 
 
histogram_bounds   | correlation
+---+-+---+---+++---+-+-
 public | log   | event   | 0 |10 | 25 | 
{PUBLISH,PUBREC1,NEWPAGE,UPLOAD,LOGIN,KILLSESS,LOGOUT} | 
{0.257333,0.248333,0.072,0.0696667,0.061,0.054,0.0506667} | 
{ACCTUPD,DELETE,IDEMAIL,NEWUSR,NEWUSR,PUBREC0,PUBREC0,RELOGIN,SYSMAINT,UNPUBLISH,UNPUBLISH}
 |
0.120881
(1 row)

AFTER
pajax=>  select * from pg_stats where tablename='log' and attname='event';
 schemaname | tablename | attname | null_frac | avg_width | n_distinct |

most_common_vals|   
   
most_common_freqs  |
   
histogram_bounds
| 
correlation
+---+-+---+---+++-+---

Re: [PERFORM] Choosing a filesystem

2008-09-23 Thread Simon Riggs

On Tue, 2008-09-23 at 13:02 -0400, Bruce Momjian wrote:
> Merlin Moncure wrote:
> > > although for postgres the thing that you are doing the fsync on is the WAL
> > > log file. that is a single (usually) contiguous file. As such it is very
> > > efficiant to write large chunks of it. so while you will degrade from the
> > > battery-only mode, the fact that the controller can flush many requests
> > > worth of writes out to the WAL log at once while you fill the cache with
> > > them one at a time is still a significant win.
> > 
> > The heap files have to be synced as well during checkpoints, etc.
> 
> True, but as of 8.3 those checkpoint fsyncs are spread over the interval
> between checkpoints.

No, the fsyncs still all happen in a tight window after we have issued
the writes. There's no waits in between them at all. The delays we
introduced are all in the write phase. Whether that is important or not
depends upon OS parameter settings.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intel's X25-M SSD

2008-09-22 Thread Simon Riggs

On Mon, 2008-09-08 at 19:12 -0400, Greg Smith wrote:
> If like me you've been reading all the flash SSD drive reviews...

Great post, thanks for the information.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] autovacuum: use case for indenpedent TOAST table autovac settings

2008-08-14 Thread Simon Riggs

On Wed, 2008-08-13 at 21:30 -0400, Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> It seems like we'll want to do it somehow.  Perhaps the cleanest way is
> >> to incorporate toast-table settings in the reloptions of the parent
> >> table.  Otherwise dump/reload is gonna be a mess.
> 
> > My question is whether there is interest in actually having support for
> > this, or should we just inherit the settings from the main table.  My
> > gut feeling is that this may be needed in some cases, but perhaps I'm
> > overengineering the thing.
> 
> It seems reasonable to inherit the parent's settings by default, in any
> case.  So you could do that now and then extend the feature later if
> there's real demand.

Yeh, I can't really see a reason why you'd want to treat toast tables
differently with regard to autovacuuming. It's one more setting to get
wrong, so no thanks.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Outer joins and equivalence

2008-06-02 Thread Simon Riggs

On Mon, 2008-06-02 at 18:10 +0100, Simon Riggs wrote:
> On Tue, 2008-05-27 at 17:43 -0400, Tom Lane wrote:
> > Simon Riggs <[EMAIL PROTECTED]> writes:
> > > I have a complex query where making a small change to the SQL increases
> > > run-time by > 1000 times.
> > 
> > > The first SQL statement is of the form
> > 
> > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) 
> > 
> > > and the second is like this
> > 
> > > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id)
> > 
> > > the only difference is the substitution of a -> b
> > 
> > Please provide an actual test case.
> 
> Getting closer, but still not able to produce a moveable test case.

I've got a test case which shows something related and weird, though not
the exact case.

The queries shown here have significantly different costs, depending
upon whether we use tables a or b in the query. Since a and b are
equivalent this result isn't expected at all.

I suspect the plan variation in the original post is somehow cost
related and we are unlikely to discover the exact plan.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

drop table if exists a cascade;
create table a
(id			integer		not null primary key
,partition	integer
,filler		text
);

create table a_p1 as
select generate_series(1, 1000)::integer as id, 1::integer as partition, repeat('a',100) as filler;

alter table a_p1 add primary key (id);
alter table a_p1 inherit a; 
alter table a_p1 add check ( partition = 1);

create table a_p2 as
select generate_series(10,100100)::integer as id, 2::integer as partition, repeat('a',100) as filler;

alter table a_p2 add primary key (id);
alter table a_p2 inherit a; 
alter table a_p2 add check ( partition = 2);

drop table if exists b cascade;
create table b
(id 		integer		not null primary key
,filler		text
);

drop table if exists c cascade;
create table c
(id			integer		not null primary key
,othercol	integer
);

create index c_idx on c (othercol);

insert into c
select generate_series(1,24667), 0;
insert into c
select generate_series(25000, 27488), 1;

explain select count(*) from a join b on (a.id = b.id)
left join c on (a.id = c.id and c.othercol = 1);

explain select count(*) from a join b on (a.id = b.id)
left join c on (b.id = c.id and c.othercol = 1);

set enable_seqscan = off;

explain select count(*) from a join b on (a.id = b.id)
left join c on (a.id = c.id and c.othercol = 1);

explain select count(*) from a join b on (a.id = b.id)
left join c on (b.id = c.id and c.othercol = 1);



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Outer joins and equivalence

2008-06-02 Thread Simon Riggs

On Tue, 2008-05-27 at 17:43 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > I have a complex query where making a small change to the SQL increases
> > run-time by > 1000 times.
> 
> > The first SQL statement is of the form
> 
> > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) 
> 
> > and the second is like this
> 
> > A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id)
> 
> > the only difference is the substitution of a -> b
> 
> Please provide an actual test case.

Getting closer, but still not able to produce a moveable test case.

Symptoms are

* using partitioning
* when none of the partitions are excluded
* when equivalence classes ought to be able to reconcile join 

Still working on it

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 2GB or not 2GB

2008-06-01 Thread Simon Riggs

On Sat, 2008-05-31 at 11:53 -0700, Josh Berkus wrote:
> Simon,
> 
> > There is an optimum for each specific sort.
> 
> Well, if the optimum is something other than "as much as we can get", then we 
> still have a pretty serious issue with work_mem, no?

Depends upon your view of serious I suppose. I would say it is an
acceptable situation, but needs further optimization. I threw some ideas
around on Hackers around Dec/New Year, but I don't have time to work on
this further myself in this dev cycle. Further contributions welcome.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 2GB or not 2GB

2008-05-31 Thread Simon Riggs

On Wed, 2008-05-28 at 16:59 -0700, Josh Berkus wrote:

> sort_mem: My tests with 8.2 and DBT3 seemed to show that, due to 
> limitations of our tape sort algorithm, allocating over 2GB for a single 
> sort had no benefit.  However, Magnus and others have claimed otherwise.  
> Has this improved in 8.3?

There is an optimum for each specific sort. 

Your results cannot be used to make a global recommendation about the
setting of work_mem. So not finding any benefit in your tests *and*
Magnus seeing an improvement are not inconsistent events.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GEQO Benchmark

2008-05-28 Thread Simon Riggs

On Wed, 2008-05-28 at 13:13 -0300, Tarcizio Bini wrote:

> Of course, the geqo_threshold can be changed so that the geqo be
> performed in queries that have less than 12 tables. However, we aim to
> test the GEQO algorithm in conditions where the standard algorithm
> (dynamic programming) has a high cost to calculate the query plan.

My understanding is the GEQO cannot arrive at a better plan than the
standard optimizer, so unless you wish to measure planning time there
isn't much to test. What is the quality of the plans it generates? Well
that varies according to the input; sometimes it gets the right plan,
other times it doesn't get close.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Outer joins and equivalence

2008-05-28 Thread Simon Riggs

On Wed, 2008-05-28 at 11:45 +0100, Matthew Wakeling wrote:
> On Tue, 27 May 2008, Simon Riggs wrote:
> > I do recognise that we would *not* be able to deduce this form of SQL
> >
> > A JOIN B ON (a.id = c.id) LEFT JOIN C ON (b.id = c.id)
> 
> Surely that would not be valid SQL?

You are right, but my point was about inferences during SQL planning,
not about initial analysis of the statement.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O on select count(*)

2008-05-27 Thread Simon Riggs

On Mon, 2008-05-26 at 11:36 -0400, Tom Lane wrote:
> "Heikki Linnakangas" <[EMAIL PROTECTED]> writes:
> > Decibel! wrote:
> >> Also, Simon and Tom had an idea at PGCon: Don't set hint-bits in the 
> >> back-end if the page isn't already dirty. 
> 
> > Or even better: set the hint-bits, but don't dirty the page.
> 
> Which in fact is what Simon suggested, not the other thing.

Just raised this on -hackers, BTW. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   4   5   >