Re: [HACKERS] performance regression in 9.2/9.3

2014-06-09 Thread Linos
On 05/06/14 23:09, Linos wrote:
 On 05/06/14 19:39, Tom Lane wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 On Thu, Jun 5, 2014 at 9:54 AM, Linos i...@linos.es wrote:
 What I don't understand is why the statistics have this bad information, 
 all my tests are done on a database just restored and analyzed. Can I do 
 something to improve the quality of my database statistics and let the 
 planner do better choices? Maybe increase the statistics target of the 
 columns involved?
 By that I meant row count estimates coming out of the joins are way
 off.  This is pushing the planner into making bad choices.  The most
 pervasive problem I see is that the row count estimate boils down to
 '1' at some juncture causing the server to favor nestloop/index scan
 when something like a hash join would likely be more appropriate.
 There's some fairly wacko stuff going on in this example, like why
 is the inner HashAggregate costed so much higher by 9.3 than 8.4,
 when the inputs are basically the same?  And why does 9.3 fail to
 suppress the SubqueryScan on ven, when 8.4 does get rid of it?
 And why is the final output rows estimate so much higher in 9.3?
 That one is actually higher than the product of the two nestloop
 inputs, which looks like possibly a bug.

 I think what's happening is that 9.3 is picking what it knows to be a less
 than optimal join method so that it can sort the output by means of the
 ordered scan Index Scan using referencia_key on modelo mo, and thereby
 avoid an explicit sort of what it thinks would be 42512461 rows.  With a
 closer-to-reality estimate there, it would have gone for a plan more
 similar to 8.4's, ie, hash joins and then an explicit sort.

 There is a lot going on in this plan that we haven't been told about; for
 instance at least one of the query's tables seems to actually be a view,
 and some other ones appear to be inheritance trees with partitioning
 constraints, and I'm suspicious that some of the aggregates might be
 user-defined functions with higher than normal costs.

 I'd like to see a self-contained test case, by which I mean full details
 about the table/view schemas; it's not clear whether the actual data
 is very important here.

  regards, tom lane
 Query 2 doesn't use any view and you can find the schema here:
 http://pastebin.com/Nkv7FwRr


 Query 1 use 5 views: ticket_cabecera, ticket_linea, reserva_cabecera, 
 reserva_linea and tarifa_proveedor_modelo_precio, I have factored out the 
 four first with the same result as before, you can find the new query and the 
 new plan here:

 http://pastebin.com/7u2Dkyxp
 http://explain.depesz.com/s/2V9d

 Actually the execution time is worse than before.

 About the last view if I change join from tarifa_proveedor_modelo_precio to 
 tarifa_modelo_precio (a table with nearly the same structure as the view) the 
 query is executed much faster, but I get a similar time changing the 
 (MIN(cab.time_stamp_recepcion)::DATE = ) to (WHERE 
 cab.time_stamp_recepcion::date = ) in the ent subquery that never was a 
 view.

 Anyway I included tarifa_modelo_precio to the query1 schema file for 
 reference and you can find the plan using tarifa_modelo_precio instead of the 
 view tarifa_proveedor_modelo_precio here:

 http://explain.depesz.com/s/4gV

 query1 schema file:
 http://pastebin.com/JpqM87dr


 Regards,
 Miguel Angel.





Hello,

Is this information enough? I could try to assemble a complete test case but I 
have very little time right now because I am trying to meet a very difficult 
deadline.

I will do ASAP if needed.

Regards,
Miguel Angel.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-09 Thread Merlin Moncure
On Mon, Jun 9, 2014 at 9:51 AM, Linos i...@linos.es wrote:
 Hello,

 Is this information enough? I could try to assemble a complete test case but 
 I have very little time right now because I am trying to meet a very 
 difficult deadline.

 I will do ASAP if needed.

It is not -- it was enough to diagnose a potential problem but not the
solution.  Tom was pretty clear: I'd like to see a self-contained
test case, by which I mean full details about the table/view schemas;
it's not clear whether the actual data is very important here..

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-09 Thread Linos
On 09/06/14 16:55, Merlin Moncure wrote:
 On Mon, Jun 9, 2014 at 9:51 AM, Linos i...@linos.es wrote:
 Hello,

 Is this information enough? I could try to assemble a complete test case but 
 I have very little time right now because I am trying to meet a very 
 difficult deadline.

 I will do ASAP if needed.
 It is not -- it was enough to diagnose a potential problem but not the
 solution.  Tom was pretty clear: I'd like to see a self-contained
 test case, by which I mean full details about the table/view schemas;
 it's not clear whether the actual data is very important here..

 merlin

Merlin, in the email I replied to are attached the table/view schemas, I was 
referring to this information as enough or not. Tom said full details about 
the table/view schemas  and these details are attached to the original email I 
replied to.
 
Miguel Angel.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-09 Thread Merlin Moncure
On Mon, Jun 9, 2014 at 10:00 AM, Linos i...@linos.es wrote:
 On 09/06/14 16:55, Merlin Moncure wrote:
 On Mon, Jun 9, 2014 at 9:51 AM, Linos i...@linos.es wrote:
 Hello,

 Is this information enough? I could try to assemble a complete test case 
 but I have very little time right now because I am trying to meet a very 
 difficult deadline.

 I will do ASAP if needed.
 It is not -- it was enough to diagnose a potential problem but not the
 solution.  Tom was pretty clear: I'd like to see a self-contained
 test case, by which I mean full details about the table/view schemas;
 it's not clear whether the actual data is very important here..

 merlin

 Merlin, in the email I replied to are attached the table/view schemas, I was 
 referring to this information as enough or not. Tom said full details about 
 the table/view schemas  and these details are attached to the original email 
 I replied to.

A self contained test case would generally imply a precise sequence of
steps (possibly with supplied data, or some manipulations via
generate_series) that would reproduce the issue locally.  Since data
may not be required, you might be able to get away with a 'schema only
dump', but you'd need to make sure to include necessary statistics
(mostly what you'd need is in pg_statistic which you'd have to join
against pg_class, pg_attribute and pg_namespace).

Ideally, you'd be able to restore your schema only dump on a blank
database with autovacuum disabled, hack in your statistics, and verify
your query produced the same plan.  Then (and only then) you could tar
up your schema only file, the statistics data, and the query to update
the data, and your query with the bad plan which you've triple checked
matched your problem condition's plan, and send it to Tom.  There
might be some things I've missed but getting a blank database to
reproduce your problem with a minimum number of steps is key.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-09 Thread Linos
On 09/06/14 17:30, Merlin Moncure wrote:
 On Mon, Jun 9, 2014 at 10:00 AM, Linos i...@linos.es wrote:
 On 09/06/14 16:55, Merlin Moncure wrote:
 On Mon, Jun 9, 2014 at 9:51 AM, Linos i...@linos.es wrote:
 Hello,

 Is this information enough? I could try to assemble a complete test case 
 but I have very little time right now because I am trying to meet a very 
 difficult deadline.

 I will do ASAP if needed.
 It is not -- it was enough to diagnose a potential problem but not the
 solution.  Tom was pretty clear: I'd like to see a self-contained
 test case, by which I mean full details about the table/view schemas;
 it's not clear whether the actual data is very important here..

 merlin
 Merlin, in the email I replied to are attached the table/view schemas, I was 
 referring to this information as enough or not. Tom said full details about 
 the table/view schemas  and these details are attached to the original 
 email I replied to.
 A self contained test case would generally imply a precise sequence of
 steps (possibly with supplied data, or some manipulations via
 generate_series) that would reproduce the issue locally.  Since data
 may not be required, you might be able to get away with a 'schema only
 dump', but you'd need to make sure to include necessary statistics
 (mostly what you'd need is in pg_statistic which you'd have to join
 against pg_class, pg_attribute and pg_namespace).

 Ideally, you'd be able to restore your schema only dump on a blank
 database with autovacuum disabled, hack in your statistics, and verify
 your query produced the same plan.  Then (and only then) you could tar
 up your schema only file, the statistics data, and the query to update
 the data, and your query with the bad plan which you've triple checked
 matched your problem condition's plan, and send it to Tom.  There
 might be some things I've missed but getting a blank database to
 reproduce your problem with a minimum number of steps is key.

 merlin

oh I understand now, sorry for the misunderstanding,  I will prepare the 
complete test case ASAP, thank you for the explanation Merlin.

Miguel Angel.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-09 Thread Tom Lane
Linos i...@linos.es writes:
 On 05/06/14 19:39, Tom Lane wrote:
 I'd like to see a self-contained test case, by which I mean full details
 about the table/view schemas; it's not clear whether the actual data
 is very important here.

 query1 schema file:
 http://pastebin.com/JpqM87dr

Sorry about the delay on getting back to this.  I downloaded the above
schema file and tried to run the originally given query with it, and it
failed because the query refers to a couple of tienda columns that
don't exist anywhere in this schema.  When you submit an updated version,
please make sure that all the moving parts match ;-).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-09 Thread Linos
On 09/06/14 18:31, Tom Lane wrote:
 Linos i...@linos.es writes:
 On 05/06/14 19:39, Tom Lane wrote:
 I'd like to see a self-contained test case, by which I mean full details
 about the table/view schemas; it's not clear whether the actual data
 is very important here.
 query1 schema file:
 http://pastebin.com/JpqM87dr
 Sorry about the delay on getting back to this.  I downloaded the above
 schema file and tried to run the originally given query with it, and it
 failed because the query refers to a couple of tienda columns that
 don't exist anywhere in this schema.  When you submit an updated version,
 please make sure that all the moving parts match ;-).

   regards, tom lane

Tom are you trying with the modified query 1 I posted in the email you found 
the schema link? I changed a little bit to remove 4 views, these views were 
where tienda columns were.

Here you can find the modified query and the new explain without these views.

http://pastebin.com/7u2Dkyxp
http://explain.depesz.com/s/2V9d

Anyway Merlin told me how to create a more complete self-contained case without 
data, I will try to do it ASAP, I am really busy right now trying to meet a 
deadline but I will try to search for a while to create this test-case.

Thank you Tom.

Regards,
Miguel Angel.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-05 Thread Linos
On 05/06/14 13:32, Linos wrote:
 Hello all,

 This is a continuation of the thread found here:
 http://www.postgresql.org/message-id/538f2578.9080...@linos.es

 Considering this seems to be a problem with the planner I thought that maybe 
 would be a better idea to post this problem here.

 To summarize the original thread I upgraded a medium (17Gb) database from 
 PostgreSQL 8.4 to 9.3 and many of the queries my application uses started 
 performing a lot slower, Merlin advised me to try disabling nestloop, this 
 helped out for the particular query I was asking about but it is not a 
 solution that I can/would like to use in the general case.

 I simplified a little bit the original query and I have added another one 
 with same problem.

 query 1:
 http://pastebin.com/32QxbNqW

 query 1 postgres 9.3 nestloop enabled:
 http://explain.depesz.com/s/6WX

 query 1 postgres 8.4:
 http://explain.depesz.com/s/Q7V

 query 1 postgres 9.3 nestloop disabled:
 http://explain.depesz.com/s/w1n

 query 1 postgres 9.3 changed having min(ts_recepcion) = for where 
 ts_recepcion = 
 http://explain.depesz.com/s/H5V


 query 2:
 http://pastebin.com/JmfPcRg8

 query 2 postgres 9.3 nestloop enabled:
 http://explain.depesz.com/s/EY7

 query 2 postgres 8.4:
 http://explain.depesz.com/s/Xc4

 query 2 postgres 9.3 nestloop disabled:
 http://explain.depesz.com/s/oO6O

 query 2 postgres 9.3 changed between to equal for date filter:
 http://explain.depesz.com/s/cP2H


 As you can see in this links the problem disappears when I disable nestloop, 
 another thing I discovered making different combinations of changes is that 
 it seems to be related with date/timestamp fields, small changes to the 
 queries fix the problem without disabling nestloop.

 For example in query 1 changing this:
   WHERE cab.id_almacen_destino = 109
   GROUP BY mo.modelo_id
   HAVING MIN(cab.time_stamp_recepcion)::date = (current_date - interval '30 
 days')::date

 to this:
   WHERE cab.id_almacen_destino = 109
 AND cab.time_stamp_recepcion::date = (current_date - interval '30 
 days')::date
   GROUP BY mo.modelo_id

 in the first subquery fixed the execution time problem, I know the result is 
 not the same, the second change is a better example:

 In query2 changing this:
 WHERE fecha BETWEEN '2014-05-19' AND '2014-05-19'
 to this:
 WHERE fecha = '2014-05-19'

 fixes the problem, as you can see in the different explains.

 This changes are not needed to make PostgreSQL 8.4 take the correct plan but 
 they are in 9.2/9.3, I haven't tried 9.1 or 9.0 yet.

 Merlin advised me to create a small test case, the thing is that the tables 
 involved can be pretty large. The best way to create a good test case would 
 be to use generate_series or something alike to try to replicate this problem 
 from zero without any dump, no?


 Regards,
 Miguel Angel.



Hi, to put a little more of data on the table, on 9.1 I can reproduce the query 
1 problem but not the query 2 problem.

Regards,
Miguel Angel.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-05 Thread Merlin Moncure
On Thu, Jun 5, 2014 at 6:32 AM, Linos i...@linos.es wrote:
 Hello all,

 This is a continuation of the thread found here:
 http://www.postgresql.org/message-id/538f2578.9080...@linos.es

 Considering this seems to be a problem with the planner I thought that maybe 
 would be a better idea to post this problem here.

 To summarize the original thread I upgraded a medium (17Gb) database from 
 PostgreSQL 8.4 to 9.3 and many of the queries my application uses started 
 performing a lot slower, Merlin advised me to try disabling nestloop, this 
 helped out for the particular query I was asking about but it is not a 
 solution that I can/would like to use in the general case.

 I simplified a little bit the original query and I have added another one 
 with same problem.

I believe the basic problem (this is just one example; I've
anecdotally seen this myself) is that changes in the query planner
(which I don't follow and fully understand) in recent versions seem to
be such that the planner makes better decisions in the presence of
good information but in certain cases makes worse choices when dealing
with bad information.  Statistics errors tend to accumulate and
magnify in complicated plans, especially when the SQL is not optimally
written.

I have no clue what the right solution is.  There's been several
discussions about 'plan risk' and trying to get the server to pick
plans with better worse case behavior in cases where statistics are
demonstrably suspicious.  Maybe that would work but ISTM is a huge
research item that won't get solved quickly or even necessarily pan
out in the end.  Nevertheless, user supplied test cases demonstrating
performance regressions (bonus if it can be scripted out of
generate_series) are going to be key drivers in finding a solution.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-05 Thread Linos
On 05/06/14 16:40, Merlin Moncure wrote:
 On Thu, Jun 5, 2014 at 6:32 AM, Linos i...@linos.es wrote:
 Hello all,

 This is a continuation of the thread found here:
 http://www.postgresql.org/message-id/538f2578.9080...@linos.es

 Considering this seems to be a problem with the planner I thought that maybe 
 would be a better idea to post this problem here.

 To summarize the original thread I upgraded a medium (17Gb) database from 
 PostgreSQL 8.4 to 9.3 and many of the queries my application uses started 
 performing a lot slower, Merlin advised me to try disabling nestloop, this 
 helped out for the particular query I was asking about but it is not a 
 solution that I can/would like to use in the general case.

 I simplified a little bit the original query and I have added another one 
 with same problem.
 I believe the basic problem (this is just one example; I've
 anecdotally seen this myself) is that changes in the query planner
 (which I don't follow and fully understand) in recent versions seem to
 be such that the planner makes better decisions in the presence of
 good information but in certain cases makes worse choices when dealing
 with bad information.  Statistics errors tend to accumulate and
 magnify in complicated plans, especially when the SQL is not optimally
 written.

 I have no clue what the right solution is.  There's been several
 discussions about 'plan risk' and trying to get the server to pick
 plans with better worse case behavior in cases where statistics are
 demonstrably suspicious.  Maybe that would work but ISTM is a huge
 research item that won't get solved quickly or even necessarily pan
 out in the end.  Nevertheless, user supplied test cases demonstrating
 performance regressions (bonus if it can be scripted out of
 generate_series) are going to be key drivers in finding a solution.

 merlin



What I don't understand is why the statistics have this bad information, all my 
tests are done on a database just restored and analyzed. Can I do something to 
improve the quality of my database statistics and let the planner do better 
choices? Maybe increase the statistics target of the columns involved?

Regards,
Miguel Angel.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-05 Thread Linos
On 05/06/14 16:40, Merlin Moncure wrote:
 On Thu, Jun 5, 2014 at 6:32 AM, Linos i...@linos.es wrote:
 Hello all,

 This is a continuation of the thread found here:
 http://www.postgresql.org/message-id/538f2578.9080...@linos.es

 Considering this seems to be a problem with the planner I thought that maybe 
 would be a better idea to post this problem here.

 To summarize the original thread I upgraded a medium (17Gb) database from 
 PostgreSQL 8.4 to 9.3 and many of the queries my application uses started 
 performing a lot slower, Merlin advised me to try disabling nestloop, this 
 helped out for the particular query I was asking about but it is not a 
 solution that I can/would like to use in the general case.

 I simplified a little bit the original query and I have added another one 
 with same problem.
 I believe the basic problem (this is just one example; I've
 anecdotally seen this myself) is that changes in the query planner
 (which I don't follow and fully understand) in recent versions seem to
 be such that the planner makes better decisions in the presence of
 good information but in certain cases makes worse choices when dealing
 with bad information.  Statistics errors tend to accumulate and
 magnify in complicated plans, especially when the SQL is not optimally
 written.

 I have no clue what the right solution is.  There's been several
 discussions about 'plan risk' and trying to get the server to pick
 plans with better worse case behavior in cases where statistics are
 demonstrably suspicious.  Maybe that would work but ISTM is a huge
 research item that won't get solved quickly or even necessarily pan
 out in the end.  Nevertheless, user supplied test cases demonstrating
 performance regressions (bonus if it can be scripted out of
 generate_series) are going to be key drivers in finding a solution.

 merlin

I tried setting statistics to 1 on 
albaran_entrada_cabecera.time_stamp_recepcion (query 1) and 
ticket_cabecera.fecha (query 2), query 2 is fixed after analyze with the new 
statistics target (with 5000 as target is fixed too) but query 1 doesn't 
improve.

Regards,
Miguel Angel.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-05 Thread Merlin Moncure
On Thu, Jun 5, 2014 at 9:54 AM, Linos i...@linos.es wrote:
 What I don't understand is why the statistics have this bad information, all 
 my tests are done on a database just restored and analyzed. Can I do 
 something to improve the quality of my database statistics and let the 
 planner do better choices? Maybe increase the statistics target of the 
 columns involved?

By that I meant row count estimates coming out of the joins are way
off.  This is pushing the planner into making bad choices.  The most
pervasive problem I see is that the row count estimate boils down to
'1' at some juncture causing the server to favor nestloop/index scan
when something like a hash join would likely be more appropriate.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-05 Thread Greg Stark
On Thu, Jun 5, 2014 at 3:54 PM, Linos i...@linos.es wrote:
 What I don't understand is why the statistics have this bad information, all 
 my tests are done on a database just restored and analyzed. Can I do 
 something to improve the quality of my database statistics and let the 
 planner do better choices? Maybe increase the statistics target of the 
 columns involved?

The statistics don't seem different at all in this case. The planner
is predicting more or less the same results right up to the top level
join where it think it'll be joining 200 rows by 92,000 rows. In 8.4
it predicted the join will produce 200 rows but in 9.4 it's predicting
the join will produce 42 million rows. That's a pretty big difference.
The actual number of rows it's seeing are about 2000x68 in both
versions. I think in this case part of the answer is just that if your
estimates are wrong then the planner will make bad deductions and
it'll just be luck whether one set of bad deductions will produce
better or worse plans than another set of bad deductions.

The particular bad deductions here are that 9.3 is better able to
deduce the ordering of the aggregates and avoid the extra sort. In 8.4
it probably wasn't aware of any plans that would produce rows in the
right order.

But why is it guessing the join will produce 42 million in 9.4 and
only 200 in 8.4?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-05 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 On Thu, Jun 5, 2014 at 9:54 AM, Linos i...@linos.es wrote:
 What I don't understand is why the statistics have this bad information, all 
 my tests are done on a database just restored and analyzed. Can I do 
 something to improve the quality of my database statistics and let the 
 planner do better choices? Maybe increase the statistics target of the 
 columns involved?

 By that I meant row count estimates coming out of the joins are way
 off.  This is pushing the planner into making bad choices.  The most
 pervasive problem I see is that the row count estimate boils down to
 '1' at some juncture causing the server to favor nestloop/index scan
 when something like a hash join would likely be more appropriate.

There's some fairly wacko stuff going on in this example, like why
is the inner HashAggregate costed so much higher by 9.3 than 8.4,
when the inputs are basically the same?  And why does 9.3 fail to
suppress the SubqueryScan on ven, when 8.4 does get rid of it?
And why is the final output rows estimate so much higher in 9.3?
That one is actually higher than the product of the two nestloop
inputs, which looks like possibly a bug.

I think what's happening is that 9.3 is picking what it knows to be a less
than optimal join method so that it can sort the output by means of the
ordered scan Index Scan using referencia_key on modelo mo, and thereby
avoid an explicit sort of what it thinks would be 42512461 rows.  With a
closer-to-reality estimate there, it would have gone for a plan more
similar to 8.4's, ie, hash joins and then an explicit sort.

There is a lot going on in this plan that we haven't been told about; for
instance at least one of the query's tables seems to actually be a view,
and some other ones appear to be inheritance trees with partitioning
constraints, and I'm suspicious that some of the aggregates might be
user-defined functions with higher than normal costs.

I'd like to see a self-contained test case, by which I mean full details
about the table/view schemas; it's not clear whether the actual data
is very important here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2/9.3

2014-06-05 Thread Linos
On 05/06/14 19:39, Tom Lane wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 On Thu, Jun 5, 2014 at 9:54 AM, Linos i...@linos.es wrote:
 What I don't understand is why the statistics have this bad information, 
 all my tests are done on a database just restored and analyzed. Can I do 
 something to improve the quality of my database statistics and let the 
 planner do better choices? Maybe increase the statistics target of the 
 columns involved?
 By that I meant row count estimates coming out of the joins are way
 off.  This is pushing the planner into making bad choices.  The most
 pervasive problem I see is that the row count estimate boils down to
 '1' at some juncture causing the server to favor nestloop/index scan
 when something like a hash join would likely be more appropriate.
 There's some fairly wacko stuff going on in this example, like why
 is the inner HashAggregate costed so much higher by 9.3 than 8.4,
 when the inputs are basically the same?  And why does 9.3 fail to
 suppress the SubqueryScan on ven, when 8.4 does get rid of it?
 And why is the final output rows estimate so much higher in 9.3?
 That one is actually higher than the product of the two nestloop
 inputs, which looks like possibly a bug.

 I think what's happening is that 9.3 is picking what it knows to be a less
 than optimal join method so that it can sort the output by means of the
 ordered scan Index Scan using referencia_key on modelo mo, and thereby
 avoid an explicit sort of what it thinks would be 42512461 rows.  With a
 closer-to-reality estimate there, it would have gone for a plan more
 similar to 8.4's, ie, hash joins and then an explicit sort.

 There is a lot going on in this plan that we haven't been told about; for
 instance at least one of the query's tables seems to actually be a view,
 and some other ones appear to be inheritance trees with partitioning
 constraints, and I'm suspicious that some of the aggregates might be
 user-defined functions with higher than normal costs.

 I'd like to see a self-contained test case, by which I mean full details
 about the table/view schemas; it's not clear whether the actual data
 is very important here.

   regards, tom lane

Query 2 doesn't use any view and you can find the schema here:
http://pastebin.com/Nkv7FwRr


Query 1 use 5 views: ticket_cabecera, ticket_linea, reserva_cabecera, 
reserva_linea and tarifa_proveedor_modelo_precio, I have factored out the four 
first with the same result as before, you can find the new query and the new 
plan here:

http://pastebin.com/7u2Dkyxp
http://explain.depesz.com/s/2V9d

Actually the execution time is worse than before.

About the last view if I change join from tarifa_proveedor_modelo_precio to 
tarifa_modelo_precio (a table with nearly the same structure as the view) the 
query is executed much faster, but I get a similar time changing the 
(MIN(cab.time_stamp_recepcion)::DATE = ) to (WHERE 
cab.time_stamp_recepcion::date = ) in the ent subquery that never was a 
view.

Anyway I included tarifa_modelo_precio to the query1 schema file for reference 
and you can find the plan using tarifa_modelo_precio instead of the view 
tarifa_proveedor_modelo_precio here:

http://explain.depesz.com/s/4gV

query1 schema file:
http://pastebin.com/JpqM87dr


Regards,
Miguel Angel.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers