Re: Re: Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-10 Thread Pavel Stehule
út 11. 10. 2022 v 7:08 odesílatel gzh  napsal:

> Hi, Pavel
>
>
> > The LIMIT clause changes total cost.  This is a very aggressive clause.
> And
>
> > although it is absolutely useless in this case, Postgres does not have
> any
>
> > logic for removing it. Postgres doesn't try to fix developer's mistakes.
>
> Sorry,I didn't understand what you mean.
>
> Couldn't the LIMIT clause be used like the SQL statement below?
>
>
> >> new=# explain analyze select 2 from analyze_word_reports where (cseid =
>
> >> 94) limit 1;
>

there was query

SELECT aggregate() FROM xx LIMIT 1


> This SQL statement is no problem under PostgreSQL 8.4, the index works
> well.
>
>
>
The optimizer is under nonstop change. And you can expect from any new
release

75% queries are without change, 20% queries are faster, and 5% queries are
slower

The optimization is based on statistics and estimations, and searching for
the optimal solution in space of all solutions. In any version there are
smaller or bigger changes of estimation methods, and between old 8.4 and 12
there are big changes in possibilities of how the query can be executed. So
there is a higher possibility to find some really fast queries, but there
is a higher possibility to find some local optimum or slow query too.
Usually the optimizer is smarter (what is the benefit), but more sensitive
too (what is the cost). You cannot expect the same result, when the data
and algorithm is changed in any version. Postgres doesn't garant the
immutability of execution plans.

The clause LIMIT with low LIMIT value can be problematic in more cases. The
model in Postgres expects data are uniformly stored in the table (heap),
but the reality can be different. The common trick in these cases is using
OFFSET 0 clause like

SELECT * FROM (SELECT * FROM foo LIMIT 1000 OFFSET 0) s LIMIT 10.







>
>
>
>
> At 2022-10-11 12:13:47, "Pavel Stehule"  wrote:
>
>
>
> út 11. 10. 2022 v 6:05 odesílatel gzh  napsal:
>
>>
>> Hi, Pavel
>>
>> Thank you for your reply.
>>
>>
>> > the LIMIT clause is in this case totally useless and messy, and maybe
>> can
>>
>> > negative impacts optimizer
>>
>> Yes. After removing the LIMIT clause, the performance is improved.
>>
>> The execution plan shows that the index worked.
>>
>> We've noticed it, but I don't want to fix the problem by modifying the
>> SQL until I find the cause.
>>
>
> The LIMIT clause changes total cost.  This is a very aggressive clause.
> And although it is absolutely useless in this case, Postgres does not have
> any logic for removing it. Postgres doesn't try to fix developer's mistakes.
>
>
>
>>
>>
>>
>>
>> At 2022-10-11 11:32:48, "Pavel Stehule"  wrote:
>>
>>
>>
>> út 11. 10. 2022 v 5:13 odesílatel gzh  napsal:
>>
>>> Hi, Tom
>>> Thank you for your reply.
>>>
>>> > When you're asking for help, please don't give us vague statements
>>>
>>> > like "doesn't seem to work".
>>>
>>> I understand.
>>>
>>>
>>> > Did the plan (including rowcount
>>>
>>> > estimates) change at all?  To what?  How far off is that rowcount
>>>
>>> > estimate, anyway --- that is, how many rows actually have cseid = 94?
>>>
>>> Please refer to the new execution plan (PostgreSQL 12.11) below.
>>>
>>>
>>> new=# show enable_seqscan;
>>>
>>>  enable_seqscan
>>>
>>> 
>>>
>>>  on
>>>
>>> (1 行)
>>>
>>>
>>> new=# select count(*) from analyze_word_reports;
>>>
>>>   count
>>>
>>> --
>>>
>>>  21331980
>>>
>>> (1 行)
>>>
>>>
>>> new=# select count(*) from analyze_word_reports where (cseid = 94);
>>>
>>>   count
>>>
>>> -
>>>
>>>  1287156
>>>
>>> (1 行)
>>>
>>>
>>> new=# explain analyze select count(2) from analyze_word_reports where
>>> (cseid = 94) limit 1;
>>>
>>>
>>>  QUERY PLAN
>>>
>>>
>>>
>>>
>>
>> the LIMIT clause is in this case totally useless and messy, and maybe can
>> negative impacts optimizer
>>
>> Regards
>>
>> Pavel
>>
>>
>>
>>>
>>> --
>>>
>>>
>>> --
>>>
>>>  Limit  (cost=65184.06..65184.07 rows=1 width=8) (actual
>>> time=123.713..133.035 rows=1 loops=1)
>>>
>>>->  Finalize Aggregate  (cost=65184.06..65184.07 rows=1 width=8)
>>> (actual time=123.712..133.033 rows=1 loops=1)
>>>
>>>  ->  Gather  (cost=65183.85..65184.06 rows=2 width=8) (actual
>>> time=123.548..133.024 rows=3 loops=1)
>>>
>>>Workers Planned: 2
>>>
>>>Workers Launched: 2
>>>
>>>->  Partial Aggregate  (cost=64183.85..64183.86 rows=1
>>> width=8) (actual time=119.495..119.496 rows=1 loops=3)
>>>
>>>  ->  Parallel Index Only Scan using
>>> analyze_word_reports_index_cseid on analyze_word_reports  (cost=0.56..6290
>>>
>>> 9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
>>>
>>>Index Cond: (cseid = 94)
>>>
>>>

Re:Re: Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-10 Thread gzh
Hi, Pavel




> The LIMIT clause changes total cost.  This is a very aggressive clause. And

> although it is absolutely useless in this case, Postgres does not have any

> logic for removing it. Postgres doesn't try to fix developer's mistakes.

Sorry,I didn't understand what you mean.

Couldn't the LIMIT clause be used like the SQL statement below?




>> new=# explain analyze select 2 from analyze_word_reports where (cseid =

>> 94) limit 1;




This SQL statement is no problem under PostgreSQL 8.4, the index works well.
















At 2022-10-11 12:13:47, "Pavel Stehule"  wrote:





út 11. 10. 2022 v 6:05 odesílatel gzh  napsal:





Hi, Pavel

Thank you for your reply.




> the LIMIT clause is in this case totally useless and messy, and maybe can

> negative impacts optimizer

Yes. After removing the LIMIT clause, the performance is improved. 

The execution plan shows that the index worked.

We've noticed it, but I don't want to fix the problem by modifying the SQL 
until I find the cause.



The LIMIT clause changes total cost.  This is a very aggressive clause. And 
although it is absolutely useless in this case, Postgres does not have any 
logic for removing it. Postgres doesn't try to fix developer's mistakes.



 














At 2022-10-11 11:32:48, "Pavel Stehule"  wrote:





út 11. 10. 2022 v 5:13 odesílatel gzh  napsal:

Hi, Tom
Thank you for your reply.


> When you're asking for help, please don't give us vague statements

> like "doesn't seem to work". 

I understand.




> Did the plan (including rowcount

> estimates) change at all?  To what?  How far off is that rowcount

> estimate, anyway --- that is, how many rows actually have cseid = 94?

Please refer to the new execution plan (PostgreSQL 12.11) below.




new=# show enable_seqscan;

 enable_seqscan



 on

(1 行)




new=# select count(*) from analyze_word_reports;

  count   

--

 21331980

(1 行)




new=# select count(*) from analyze_word_reports where (cseid = 94);

  count

-

 1287156

(1 行)




new=# explain analyze select count(2) from analyze_word_reports where (cseid = 
94) limit 1;


 QUERY PLAN   

  



the LIMIT clause is in this case totally useless and messy, and maybe can 
negative impacts optimizer


Regards


Pavel


 

--

--

 Limit  (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 
rows=1 loops=1)

   ->  Finalize Aggregate  (cost=65184.06..65184.07 rows=1 width=8) (actual 
time=123.712..133.033 rows=1 loops=1)

 ->  Gather  (cost=65183.85..65184.06 rows=2 width=8) (actual 
time=123.548..133.024 rows=3 loops=1)

   Workers Planned: 2

   Workers Launched: 2

   ->  Partial Aggregate  (cost=64183.85..64183.86 rows=1 width=8) 
(actual time=119.495..119.496 rows=1 loops=3)

 ->  Parallel Index Only Scan using 
analyze_word_reports_index_cseid on analyze_word_reports  (cost=0.56..6290

9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)

   Index Cond: (cseid = 94)

   Heap Fetches: 1287156  Planning Time: 0.122 ms  
Execution Time: 133.069 ms

(11 行)




new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) 
limit 1;

 QUERY PLAN 
  

   

--

---

 Limit  (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 
loops=1)

   ->  Seq Scan on analyze_word_reports  (cost=0.00..528550.75 rows=1223533 
width=4) (actual time=2156.962..2156.964 rows=1 loops=

1)

 Filter: (cseid = 94)

 Rows Removed by Filter: 18320180  Planning Time: 0.086 ms  Execution 
Time: 2156.985 ms

(6 行)







> If the estimate is far off, then increasing the table's statistics

> target might help.

Thank you for your advice. 

Please tell me how to set the table's statistics up to improve performance.




new=#  select oid from pg_class where relname = 'analyze_word_reports';

  oid  

---

16429

(1 行)




new=# select attrelid,attname,attstattarget from pg_attribute where 
attrelid=16429 and attname='cseid';

attrelid | attname | attstattarget 

--+-+---

16429 | cseid   |-1

(1 行)




> Another thing that would be worth checking is whether

> "set enable_seqscan = off" prods it to choose the plan you want.

> If not, then 

Re: Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-10 Thread Pavel Stehule
út 11. 10. 2022 v 6:05 odesílatel gzh  napsal:

>
> Hi, Pavel
>
> Thank you for your reply.
>
>
> > the LIMIT clause is in this case totally useless and messy, and maybe can
>
> > negative impacts optimizer
>
> Yes. After removing the LIMIT clause, the performance is improved.
>
> The execution plan shows that the index worked.
>
> We've noticed it, but I don't want to fix the problem by modifying the SQL
> until I find the cause.
>

The LIMIT clause changes total cost.  This is a very aggressive clause. And
although it is absolutely useless in this case, Postgres does not have any
logic for removing it. Postgres doesn't try to fix developer's mistakes.



>
>
>
>
> At 2022-10-11 11:32:48, "Pavel Stehule"  wrote:
>
>
>
> út 11. 10. 2022 v 5:13 odesílatel gzh  napsal:
>
>> Hi, Tom
>> Thank you for your reply.
>>
>> > When you're asking for help, please don't give us vague statements
>>
>> > like "doesn't seem to work".
>>
>> I understand.
>>
>>
>> > Did the plan (including rowcount
>>
>> > estimates) change at all?  To what?  How far off is that rowcount
>>
>> > estimate, anyway --- that is, how many rows actually have cseid = 94?
>>
>> Please refer to the new execution plan (PostgreSQL 12.11) below.
>>
>>
>> new=# show enable_seqscan;
>>
>>  enable_seqscan
>>
>> 
>>
>>  on
>>
>> (1 行)
>>
>>
>> new=# select count(*) from analyze_word_reports;
>>
>>   count
>>
>> --
>>
>>  21331980
>>
>> (1 行)
>>
>>
>> new=# select count(*) from analyze_word_reports where (cseid = 94);
>>
>>   count
>>
>> -
>>
>>  1287156
>>
>> (1 行)
>>
>>
>> new=# explain analyze select count(2) from analyze_word_reports where
>> (cseid = 94) limit 1;
>>
>>
>>QUERY PLAN
>>
>>
>>
>
> the LIMIT clause is in this case totally useless and messy, and maybe can
> negative impacts optimizer
>
> Regards
>
> Pavel
>
>
>
>>
>> --
>>
>> --
>>
>>  Limit  (cost=65184.06..65184.07 rows=1 width=8) (actual
>> time=123.713..133.035 rows=1 loops=1)
>>
>>->  Finalize Aggregate  (cost=65184.06..65184.07 rows=1 width=8)
>> (actual time=123.712..133.033 rows=1 loops=1)
>>
>>  ->  Gather  (cost=65183.85..65184.06 rows=2 width=8) (actual
>> time=123.548..133.024 rows=3 loops=1)
>>
>>Workers Planned: 2
>>
>>Workers Launched: 2
>>
>>->  Partial Aggregate  (cost=64183.85..64183.86 rows=1
>> width=8) (actual time=119.495..119.496 rows=1 loops=3)
>>
>>  ->  Parallel Index Only Scan using
>> analyze_word_reports_index_cseid on analyze_word_reports  (cost=0.56..6290
>>
>> 9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
>>
>>Index Cond: (cseid = 94)
>>
>>Heap Fetches: 1287156  Planning Time: 0.122
>> ms  Execution Time: 133.069 ms
>>
>> (11 行)
>>
>>
>> new=# explain analyze select 2 from analyze_word_reports where (cseid =
>> 94) limit 1;
>>
>>  QUERY PLAN
>>
>>
>>
>>
>>
>> --
>>
>> ---
>>
>>  Limit  (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966
>> rows=1 loops=1)
>>
>>->  Seq Scan on analyze_word_reports  (cost=0.00..528550.75
>> rows=1223533 width=4) (actual time=2156.962..2156.964 rows=1 loops=
>>
>> 1)
>>
>>  Filter: (cseid = 94)
>>
>>  Rows Removed by Filter: 18320180  Planning Time: 0.086 ms
>> Execution Time: 2156.985 ms
>>
>> (6 行)
>>
>>
>>
>> > If the estimate is far off, then increasing the table's statistics
>>
>> > target might help.
>>
>> Thank you for your advice.
>>
>> Please tell me how to set the table's statistics up to improve
>> performance.
>>
>>
>> new=#  select oid from pg_class where relname = 'analyze_word_reports';
>>
>>   oid
>>
>> ---
>>
>> 16429
>>
>> (1 行)
>>
>>
>> new=# select attrelid,attname,attstattarget from pg_attribute where
>> attrelid=16429 and attname='cseid';
>>
>> attrelid | attname | attstattarget
>>
>> --+-+---
>>
>> 16429 | cseid   |-1
>>
>> (1 行)
>>
>>
>> > Another thing that would be worth checking is whether
>>
>> > "set enable_seqscan = off" prods it to choose the plan you want.
>>
>> > If not, then there's something else going on besides poor estimates.
>>
>> "set enable_seqscan = off" works, and the performance is greatly
>> improved, which is almost the same as PostgreSQL 8.4.
>>
>> The enable_seqscan(PostgreSQL 8.4) is on, will this change have an
>> unknown effect on other queries?
>>
>>
>>
>>
>>
>> At 2022-10-10 10:45:54, "Tom Lane"  wrote:
>> >gzh   writes:
>> >> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>> >
>> 

Re:Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-10 Thread gzh



Hi, Pavel

Thank you for your reply.




> the LIMIT clause is in this case totally useless and messy, and maybe can

> negative impacts optimizer

Yes. After removing the LIMIT clause, the performance is improved. 

The execution plan shows that the index worked.

We've noticed it, but I don't want to fix the problem by modifying the SQL 
until I find the cause.













At 2022-10-11 11:32:48, "Pavel Stehule"  wrote:





út 11. 10. 2022 v 5:13 odesílatel gzh  napsal:

Hi, Tom
Thank you for your reply.


> When you're asking for help, please don't give us vague statements

> like "doesn't seem to work". 

I understand.




> Did the plan (including rowcount

> estimates) change at all?  To what?  How far off is that rowcount

> estimate, anyway --- that is, how many rows actually have cseid = 94?

Please refer to the new execution plan (PostgreSQL 12.11) below.




new=# show enable_seqscan;

 enable_seqscan



 on

(1 行)




new=# select count(*) from analyze_word_reports;

  count   

--

 21331980

(1 行)




new=# select count(*) from analyze_word_reports where (cseid = 94);

  count

-

 1287156

(1 行)




new=# explain analyze select count(2) from analyze_word_reports where (cseid = 
94) limit 1;


 QUERY PLAN   

  



the LIMIT clause is in this case totally useless and messy, and maybe can 
negative impacts optimizer


Regards


Pavel


 

--

--

 Limit  (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 
rows=1 loops=1)

   ->  Finalize Aggregate  (cost=65184.06..65184.07 rows=1 width=8) (actual 
time=123.712..133.033 rows=1 loops=1)

 ->  Gather  (cost=65183.85..65184.06 rows=2 width=8) (actual 
time=123.548..133.024 rows=3 loops=1)

   Workers Planned: 2

   Workers Launched: 2

   ->  Partial Aggregate  (cost=64183.85..64183.86 rows=1 width=8) 
(actual time=119.495..119.496 rows=1 loops=3)

 ->  Parallel Index Only Scan using 
analyze_word_reports_index_cseid on analyze_word_reports  (cost=0.56..6290

9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)

   Index Cond: (cseid = 94)

   Heap Fetches: 1287156  Planning Time: 0.122 ms  
Execution Time: 133.069 ms

(11 行)




new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) 
limit 1;

 QUERY PLAN 
  

   

--

---

 Limit  (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 
loops=1)

   ->  Seq Scan on analyze_word_reports  (cost=0.00..528550.75 rows=1223533 
width=4) (actual time=2156.962..2156.964 rows=1 loops=

1)

 Filter: (cseid = 94)

 Rows Removed by Filter: 18320180  Planning Time: 0.086 ms  Execution 
Time: 2156.985 ms

(6 行)







> If the estimate is far off, then increasing the table's statistics

> target might help.

Thank you for your advice. 

Please tell me how to set the table's statistics up to improve performance.




new=#  select oid from pg_class where relname = 'analyze_word_reports';

  oid  

---

16429

(1 行)




new=# select attrelid,attname,attstattarget from pg_attribute where 
attrelid=16429 and attname='cseid';

attrelid | attname | attstattarget 

--+-+---

16429 | cseid   |-1

(1 行)




> Another thing that would be worth checking is whether

> "set enable_seqscan = off" prods it to choose the plan you want.

> If not, then there's something else going on besides poor estimates.

"set enable_seqscan = off" works, and the performance is greatly improved, 
which is almost the same as PostgreSQL 8.4.

The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown 
effect on other queries?











At 2022-10-10 10:45:54, "Tom Lane"  wrote:
>gzh   writes:
>> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>
>When you're asking for help, please don't give us vague statements
>like "doesn't seem to work".  Did the plan (including rowcount
>estimates) change at all?  To what?  How far off is that rowcount
>estimate, anyway --- that is, how many rows actually have cseid = 94?
>
>If the estimate is far off, then increasing the table's statistics
>target might help.
>
>Another thing that would be worth checking is whether
>"set enable_seqscan = off" prods it to choose 

Re: Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-10 Thread Pavel Stehule
út 11. 10. 2022 v 5:13 odesílatel gzh  napsal:

> Hi, Tom
> Thank you for your reply.
>
> > When you're asking for help, please don't give us vague statements
>
> > like "doesn't seem to work".
>
> I understand.
>
>
> > Did the plan (including rowcount
>
> > estimates) change at all?  To what?  How far off is that rowcount
>
> > estimate, anyway --- that is, how many rows actually have cseid = 94?
>
> Please refer to the new execution plan (PostgreSQL 12.11) below.
>
>
> new=# show enable_seqscan;
>
>  enable_seqscan
>
> 
>
>  on
>
> (1 行)
>
>
> new=# select count(*) from analyze_word_reports;
>
>   count
>
> --
>
>  21331980
>
> (1 行)
>
>
> new=# select count(*) from analyze_word_reports where (cseid = 94);
>
>   count
>
> -
>
>  1287156
>
> (1 行)
>
>
> new=# explain analyze select count(2) from analyze_word_reports where
> (cseid = 94) limit 1;
>
>
>QUERY PLAN
>
>
>

the LIMIT clause is in this case totally useless and messy, and maybe can
negative impacts optimizer

Regards

Pavel



>
> --
>
> --
>
>  Limit  (cost=65184.06..65184.07 rows=1 width=8) (actual
> time=123.713..133.035 rows=1 loops=1)
>
>->  Finalize Aggregate  (cost=65184.06..65184.07 rows=1 width=8)
> (actual time=123.712..133.033 rows=1 loops=1)
>
>  ->  Gather  (cost=65183.85..65184.06 rows=2 width=8) (actual
> time=123.548..133.024 rows=3 loops=1)
>
>Workers Planned: 2
>
>Workers Launched: 2
>
>->  Partial Aggregate  (cost=64183.85..64183.86 rows=1
> width=8) (actual time=119.495..119.496 rows=1 loops=3)
>
>  ->  Parallel Index Only Scan using
> analyze_word_reports_index_cseid on analyze_word_reports  (cost=0.56..6290
>
> 9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)
>
>Index Cond: (cseid = 94)
>
>Heap Fetches: 1287156  Planning Time: 0.122 ms
> Execution Time: 133.069 ms
>
> (11 行)
>
>
> new=# explain analyze select 2 from analyze_word_reports where (cseid =
> 94) limit 1;
>
>  QUERY PLAN
>
>
>
>
>
> --
>
> ---
>
>  Limit  (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966
> rows=1 loops=1)
>
>->  Seq Scan on analyze_word_reports  (cost=0.00..528550.75
> rows=1223533 width=4) (actual time=2156.962..2156.964 rows=1 loops=
>
> 1)
>
>  Filter: (cseid = 94)
>
>  Rows Removed by Filter: 18320180  Planning Time: 0.086 ms
> Execution Time: 2156.985 ms
>
> (6 行)
>
>
>
> > If the estimate is far off, then increasing the table's statistics
>
> > target might help.
>
> Thank you for your advice.
>
> Please tell me how to set the table's statistics up to improve performance.
>
>
> new=#  select oid from pg_class where relname = 'analyze_word_reports';
>
>   oid
>
> ---
>
> 16429
>
> (1 行)
>
>
> new=# select attrelid,attname,attstattarget from pg_attribute where
> attrelid=16429 and attname='cseid';
>
> attrelid | attname | attstattarget
>
> --+-+---
>
> 16429 | cseid   |-1
>
> (1 行)
>
>
> > Another thing that would be worth checking is whether
>
> > "set enable_seqscan = off" prods it to choose the plan you want.
>
> > If not, then there's something else going on besides poor estimates.
>
> "set enable_seqscan = off" works, and the performance is greatly improved,
> which is almost the same as PostgreSQL 8.4.
>
> The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown
> effect on other queries?
>
>
>
>
>
> At 2022-10-10 10:45:54, "Tom Lane"  wrote:
> >gzh   writes:
> >> I've run analyze(not vacuum analyze), but it doesn't seem to work.
> >
> >When you're asking for help, please don't give us vague statements
> >like "doesn't seem to work".  Did the plan (including rowcount
> >estimates) change at all?  To what?  How far off is that rowcount
> >estimate, anyway --- that is, how many rows actually have cseid = 94?
> >
> >If the estimate is far off, then increasing the table's statistics
> >target might help.
> >
> >Another thing that would be worth checking is whether
> >"set enable_seqscan = off" prods it to choose the plan you want.
> >If not, then there's something else going on besides poor estimates.
> >
> > regards, tom lane
>
>


Re:Re: Different execution plan between PostgreSQL 8.4 and 12.11

2022-10-10 Thread gzh
Hi, Tom
Thank you for your reply.


> When you're asking for help, please don't give us vague statements

> like "doesn't seem to work". 

I understand.




> Did the plan (including rowcount

> estimates) change at all?  To what?  How far off is that rowcount

> estimate, anyway --- that is, how many rows actually have cseid = 94?

Please refer to the new execution plan (PostgreSQL 12.11) below.




new=# show enable_seqscan;

 enable_seqscan



 on

(1 行)




new=# select count(*) from analyze_word_reports;

  count   

--

 21331980

(1 行)




new=# select count(*) from analyze_word_reports where (cseid = 94);

  count

-

 1287156

(1 行)




new=# explain analyze select count(2) from analyze_word_reports where (cseid = 
94) limit 1;


 QUERY PLAN   

  

--

--

 Limit  (cost=65184.06..65184.07 rows=1 width=8) (actual time=123.713..133.035 
rows=1 loops=1)

   ->  Finalize Aggregate  (cost=65184.06..65184.07 rows=1 width=8) (actual 
time=123.712..133.033 rows=1 loops=1)

 ->  Gather  (cost=65183.85..65184.06 rows=2 width=8) (actual 
time=123.548..133.024 rows=3 loops=1)

   Workers Planned: 2

   Workers Launched: 2

   ->  Partial Aggregate  (cost=64183.85..64183.86 rows=1 width=8) 
(actual time=119.495..119.496 rows=1 loops=3)

 ->  Parallel Index Only Scan using 
analyze_word_reports_index_cseid on analyze_word_reports  (cost=0.56..6290

9.34 rows=509805 width=0) (actual time=0.031..98.706 rows=429052 loops=3)

   Index Cond: (cseid = 94)

   Heap Fetches: 1287156  Planning Time: 0.122 ms  
Execution Time: 133.069 ms

(11 行)




new=# explain analyze select 2 from analyze_word_reports where (cseid = 94) 
limit 1;

 QUERY PLAN 
  

   

--

---

 Limit  (cost=0.00..0.43 rows=1 width=4) (actual time=2156.964..2156.966 rows=1 
loops=1)

   ->  Seq Scan on analyze_word_reports  (cost=0.00..528550.75 rows=1223533 
width=4) (actual time=2156.962..2156.964 rows=1 loops=

1)

 Filter: (cseid = 94)

 Rows Removed by Filter: 18320180  Planning Time: 0.086 ms  Execution 
Time: 2156.985 ms

(6 行)







> If the estimate is far off, then increasing the table's statistics

> target might help.

Thank you for your advice. 

Please tell me how to set the table's statistics up to improve performance.




new=#  select oid from pg_class where relname = 'analyze_word_reports';

  oid  

---

16429

(1 行)




new=# select attrelid,attname,attstattarget from pg_attribute where 
attrelid=16429 and attname='cseid';

attrelid | attname | attstattarget 

--+-+---

16429 | cseid   |-1

(1 行)




> Another thing that would be worth checking is whether

> "set enable_seqscan = off" prods it to choose the plan you want.

> If not, then there's something else going on besides poor estimates.

"set enable_seqscan = off" works, and the performance is greatly improved, 
which is almost the same as PostgreSQL 8.4.

The enable_seqscan(PostgreSQL 8.4) is on, will this change have an unknown 
effect on other queries?











At 2022-10-10 10:45:54, "Tom Lane"  wrote:
>gzh   writes:
>> I've run analyze(not vacuum analyze), but it doesn't seem to work.
>
>When you're asking for help, please don't give us vague statements
>like "doesn't seem to work".  Did the plan (including rowcount
>estimates) change at all?  To what?  How far off is that rowcount
>estimate, anyway --- that is, how many rows actually have cseid = 94?
>
>If the estimate is far off, then increasing the table's statistics
>target might help.
>
>Another thing that would be worth checking is whether
>"set enable_seqscan = off" prods it to choose the plan you want.
>If not, then there's something else going on besides poor estimates.
>
>   regards, tom lane


Re: Playing with pgcrypto

2022-10-10 Thread howardnews



pgp_sym_encrypt uses a random salt each time, so you cannot compare the 
output
to p1 like you would do with crypt to verify a given plaintext.  
Instead, use

pgp_sym_decrypt with p1 as input to get the plaintext.

--
Erik


Ah! That makes sense. Thanks Erik!




Re: Playing with pgcrypto

2022-10-10 Thread Erik Wienhold
Hi Howard,

> On 11/10/2022 00:25 CEST howardn...@selestial.com wrote:
> 
> I am trying out a few pgcrypto functions. I was expecting the final 
> select statement to return the row I just inserted - Can anyone tell me 
> what I am not understanding here?
> 
> create table test (p1 bytea);
> insert into test (pgp_sym_encrypt('123', 'secret'));

Your INSERT is malformed.  It's missing a VALUES clause or SELECT.

> select * from test where pgp_sym_encrypt('123', 'secret') = p1;

pgp_sym_encrypt uses a random salt each time, so you cannot compare the output
to p1 like you would do with crypt to verify a given plaintext.  Instead, use
pgp_sym_decrypt with p1 as input to get the plaintext.

--
Erik




Playing with pgcrypto

2022-10-10 Thread howardnews

Hi all,

I am trying out a few pgcrypto functions. I was expecting the final 
select statement to return the row I just inserted - Can anyone tell me 
what I am not understanding here?


create table test (p1 bytea);
insert into test (pgp_sym_encrypt('123', 'secret'));
select * from test where pgp_sym_encrypt('123', 'secret') = p1;

Thanks.

Howard




Re: Same query, same data different plan

2022-10-10 Thread Adrian Klaver

On 10/10/22 8:12 AM, Kostas Papadopoulos wrote:


On 10/10/2022 17:53, Tom Lane wrote:

Kostas Papadopoulos  writes:
I cannot see how it can be configuration since the two databases are 
in the same

Postgres instance.


There is such a thing as ALTER DATABASE ... SET to install different
settings at the per-database level.


I understand, but I created the databases to be the same. Our original 
problem was that developers' workstations (Debian and Windows) were 
running a specific query different from a test db (Ubuntu). After 
eliminating everything we thought of (data, versions, configurations, 
OS, etc) we ended up with the scenario I described here.


So there is more to the story.

Information needed:

1) The query and its EXPLAIN ANALYZE for both slow/fast cases.

2) Postgres version.

3) What database are the developers workstation pointing at?

4) What is the test db and is it the same as 3)?

5) What clients are you using to run the query?



In general, the answer to your question is that the databases are
*not* identical.  You just haven't figured out how yet.  I'm wondering
if it has something to do with the dump/reload having compacted out
bloat in the tables or indexes, causing cost estimates to change.


I will look into that and a couple of other ideas I got from this list.



    regards, tom lane


Thanks
kostas





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Same query, same data different plan

2022-10-10 Thread Kostas Papadopoulos



On 10/10/2022 17:53, Tom Lane wrote:

Kostas Papadopoulos  writes:

I cannot see how it can be configuration since the two databases are in the same
Postgres instance.


There is such a thing as ALTER DATABASE ... SET to install different
settings at the per-database level.


I understand, but I created the databases to be the same. Our original problem was 
that developers' workstations (Debian and Windows) were running a specific query 
different from a test db (Ubuntu). After eliminating everything we thought of (data, 
versions, configurations, OS, etc) we ended up with the scenario I described here.


In general, the answer to your question is that the databases are
*not* identical.  You just haven't figured out how yet.  I'm wondering
if it has something to do with the dump/reload having compacted out
bloat in the tables or indexes, causing cost estimates to change.


I will look into that and a couple of other ideas I got from this list.



regards, tom lane


Thanks
kostas




Re: Same query, same data different plan

2022-10-10 Thread Tom Lane
Kostas Papadopoulos  writes:
> I cannot see how it can be configuration since the two databases are in the 
> same 
> Postgres instance.

There is such a thing as ALTER DATABASE ... SET to install different
settings at the per-database level.

In general, the answer to your question is that the databases are
*not* identical.  You just haven't figured out how yet.  I'm wondering
if it has something to do with the dump/reload having compacted out
bloat in the tables or indexes, causing cost estimates to change.

regards, tom lane




Re: Same query, same data different plan

2022-10-10 Thread Adrian Klaver

On 10/10/22 06:12, Julien Rouhaud wrote:

On Mon, Oct 10, 2022 at 04:05:42PM +0300, Kostas Papadopoulos wrote:

Hi,

Yes, I ran ANALYZE in both databases.


Please look at https://wiki.postgresql.org/wiki/Slow_Query_Questions to provide
more information.



Without the information, as detailed at above link, an answer to this 
question will be nothing more then guesses.


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Same query, same data different plan

2022-10-10 Thread Kostas Papadopoulos



On 10/10/2022 16:44, Ron wrote:

How identical is "identical"?

For example, does diff says that "pg_dump --schema-only" of DB1 and DB2 are perfectly 
identical?


And are the table counts identical?


I created the second database using pg_dump from the first, so they should be exactly 
the same.




On 10/10/22 08:15, Kostas Papadopoulos wrote:

Hi,

Thank you for responding. My question is not about the performance of a specific 
query. As I wrote, that is already solved.


My question is "how can it be that the same query run in two exactly the same 
databases can have different plans."



Kostas Papadopoulos

On 10/10/2022 16:12, Julien Rouhaud wrote:

On Mon, Oct 10, 2022 at 04:05:42PM +0300, Kostas Papadopoulos wrote:

Hi,

Yes, I ran ANALYZE in both databases.


Please look at https://wiki.postgresql.org/wiki/Slow_Query_Questions to provide
more information.










Re: Same query, same data different plan

2022-10-10 Thread Ron

How identical is "identical"?

For example, does diff says that "pg_dump --schema-only" of DB1 and DB2 are 
perfectly identical?


And are the table counts identical?

On 10/10/22 08:15, Kostas Papadopoulos wrote:

Hi,

Thank you for responding. My question is not about the performance of a 
specific query. As I wrote, that is already solved.


My question is "how can it be that the same query run in two exactly the 
same databases can have different plans."



Kostas Papadopoulos

On 10/10/2022 16:12, Julien Rouhaud wrote:

On Mon, Oct 10, 2022 at 04:05:42PM +0300, Kostas Papadopoulos wrote:

Hi,

Yes, I ran ANALYZE in both databases.


Please look at https://wiki.postgresql.org/wiki/Slow_Query_Questions to 
provide

more information.





--
Angular momentum makes the world go 'round.




Re: Same query, same data different plan

2022-10-10 Thread Imre Samu
> Ran analyze on both. Running the same query I'm getting different plans,
one x10 slower.

theory:

the "statistics target" is too low ?
THEN
different random sample  --> different statistics ---> different plan,.

*"For large tables, ANALYZE takes a random sample of the table contents,
rather than examining every row. "*
https://www.postgresql.org/docs/current/sql-analyze.html

IMHO:  Try to increase the statistics target   (
"default_statistics_target" , .. )

regards,
  Imre

Kostas Papadopoulos  ezt írta (időpont: 2022.
okt. 10., H, 14:56):

>
> I have two identical databases running in the same instance of Postgresql.
> Ran
> analyze on both. Running the same query I'm getting different plans, one
> x10 slower.
> Although I have solved my problem by re-writing the query, I want to
> understand why
> this is happening. If the configuration, Postgresql version, schema and
> data are the
> same, what other factors is the planner considering?
>
> --
>
> Kostas Papadopoulos
> KE MethodosIT
>
>
>
>


Re: Same query, same data different plan

2022-10-10 Thread Kostas Papadopoulos

Hi,

I cannot see how it can be configuration since the two databases are in the same 
Postgres instance.



Kostas Papadopoulos

On 10/10/2022 16:16, Pavel Stehule wrote:

po 10. 10. 2022 v 15:12 odesílatel Julien Rouhaud 
napsal:


On Mon, Oct 10, 2022 at 04:05:42PM +0300, Kostas Papadopoulos wrote:

Hi,

Yes, I ran ANALYZE in both databases.




This can be a common case. Check your configuration: work_mem,
shared_buffers, effective_cache_size, random_page_cost, seq_page_cost, ...



Please look at https://wiki.postgresql.org/wiki/Slow_Query_Questions to
provide
more information.










Re: Same query, same data different plan

2022-10-10 Thread Pavel Stehule
po 10. 10. 2022 v 15:12 odesílatel Julien Rouhaud 
napsal:

> On Mon, Oct 10, 2022 at 04:05:42PM +0300, Kostas Papadopoulos wrote:
> > Hi,
> >
> > Yes, I ran ANALYZE in both databases.
>

This can be a common case. Check your configuration: work_mem,
shared_buffers, effective_cache_size, random_page_cost, seq_page_cost, ...


> Please look at https://wiki.postgresql.org/wiki/Slow_Query_Questions to
> provide
> more information.
>
>
>


Re: Same query, same data different plan

2022-10-10 Thread Kostas Papadopoulos

Hi,

Thank you for responding. My question is not about the performance of a specific 
query. As I wrote, that is already solved.


My question is "how can it be that the same query run in two exactly the same 
databases can have different plans."



Kostas Papadopoulos

On 10/10/2022 16:12, Julien Rouhaud wrote:

On Mon, Oct 10, 2022 at 04:05:42PM +0300, Kostas Papadopoulos wrote:

Hi,

Yes, I ran ANALYZE in both databases.


Please look at https://wiki.postgresql.org/wiki/Slow_Query_Questions to provide
more information.





Re: Same query, same data different plan

2022-10-10 Thread Julien Rouhaud
On Mon, Oct 10, 2022 at 04:05:42PM +0300, Kostas Papadopoulos wrote:
> Hi,
>
> Yes, I ran ANALYZE in both databases.

Please look at https://wiki.postgresql.org/wiki/Slow_Query_Questions to provide
more information.




Re: Same query, same data different plan

2022-10-10 Thread Kostas Papadopoulos

Hi,

Yes, I ran ANALYZE in both databases.


Kostas

On 10/10/2022 16:03, Daevor The Devoted wrote:

Hi

Is the table stats up to date on both?

https://www.postgresql.org/docs/current/planner-stats.html

Best regards,
Na-iem Dollie

On Mon, Oct 10, 2022 at 2:56 PM Kostas Papadopoulos <
kos...@methodosit.com.cy> wrote:



I have two identical databases running in the same instance of Postgresql.
Ran
analyze on both. Running the same query I'm getting different plans, one
x10 slower.
Although I have solved my problem by re-writing the query, I want to
understand why
this is happening. If the configuration, Postgresql version, schema and
data are the
same, what other factors is the planner considering?

--

Kostas Papadopoulos
KE MethodosIT











Re: Same query, same data different plan

2022-10-10 Thread Daevor The Devoted
Hi

Is the table stats up to date on both?

https://www.postgresql.org/docs/current/planner-stats.html

Best regards,
Na-iem Dollie

On Mon, Oct 10, 2022 at 2:56 PM Kostas Papadopoulos <
kos...@methodosit.com.cy> wrote:

>
> I have two identical databases running in the same instance of Postgresql.
> Ran
> analyze on both. Running the same query I'm getting different plans, one
> x10 slower.
> Although I have solved my problem by re-writing the query, I want to
> understand why
> this is happening. If the configuration, Postgresql version, schema and
> data are the
> same, what other factors is the planner considering?
>
> --
>
> Kostas Papadopoulos
> KE MethodosIT
>
>
>
>


Same query, same data different plan

2022-10-10 Thread Kostas Papadopoulos



I have two identical databases running in the same instance of Postgresql. Ran 
analyze on both. Running the same query I'm getting different plans, one x10 slower. 
Although I have solved my problem by re-writing the query, I want to understand why 
this is happening. If the configuration, Postgresql version, schema and data are the 
same, what other factors is the planner considering?


--

Kostas Papadopoulos
KE MethodosIT