Re: [PERFORM] Slow query in JDBC

2017-09-29 Thread Dave Cramer
Good catch Jeff.

as for which version. We always recommend the latest version. 42.1.4

Dave Cramer

da...@postgresintl.com
www.postgresintl.com

On 29 September 2017 at 06:44, Subramaniam C 
wrote:

> Yes you are right the timestamp which the application was providing was in
> seconds whereas the query which was using index had a timestamp in
> milliseconds. So the query was taking time in application.
>
> On Fri, Sep 29, 2017 at 12:19 PM, Jeff Janes  wrote:
>
>> On Thu, Sep 28, 2017 at 2:59 AM, Subramaniam C <
>> subramaniam31...@gmail.com> wrote:
>>
>>> First output show the output when the query is executed from sql command
>>> line. The second output show when it is executed from the application. AS
>>> per the output it is clear that the when the query is executed through JDBC
>>> its not using the index (health_index) instead its doing sequence scan.
>>> Please let us know how this issue can be resolved from JDBC?
>>>
>>> 1.)
>>>
>>>
>>> * ->  Index Only Scan
>>> using health_index on health_timeseries_table  (cost=0.56..421644.56
>>> rows=1558800 width=24)*
>>>
>>> *   Index Cond: (("timestamp" >=
>>> '1505989186834'::bigint) AND ("timestamp" <= '1505990086834'::bigint))*
>>>
>>>
>>
>>> 2.)
>>>
>>>
>>>   ->  Seq Scan on
>>> health_timeseries_table  (cost=0.00..267171.00 rows=1005634 width=24)
>>>
>>> Filter: (("timestamp" >=
>>> '150598950'::bigint) AND ("timestamp" <= '150599040'::bigint))
>>>
>>
>>
>> Those are different queries, so it is not terribly surprising it might
>> choose a different plan.
>>
>> For this type of comparison, you need to compare identical queries,
>> including parameter.
>>
>> Cheers,
>>
>> Jeff
>>
>
>


Re: [PERFORM] Slow query in JDBC

2017-09-29 Thread Subramaniam C
Yes you are right the timestamp which the application was providing was in
seconds whereas the query which was using index had a timestamp in
milliseconds. So the query was taking time in application.

On Fri, Sep 29, 2017 at 12:19 PM, Jeff Janes  wrote:

> On Thu, Sep 28, 2017 at 2:59 AM, Subramaniam C  > wrote:
>
>> First output show the output when the query is executed from sql command
>> line. The second output show when it is executed from the application. AS
>> per the output it is clear that the when the query is executed through JDBC
>> its not using the index (health_index) instead its doing sequence scan.
>> Please let us know how this issue can be resolved from JDBC?
>>
>> 1.)
>>
>>
>> * ->  Index Only Scan
>> using health_index on health_timeseries_table  (cost=0.56..421644.56
>> rows=1558800 width=24)*
>>
>> *   Index Cond: (("timestamp" >=
>> '1505989186834'::bigint) AND ("timestamp" <= '1505990086834'::bigint))*
>>
>>
>
>> 2.)
>>
>>
>>   ->  Seq Scan on
>> health_timeseries_table  (cost=0.00..267171.00 rows=1005634 width=24)
>>
>> Filter: (("timestamp" >=
>> '150598950'::bigint) AND ("timestamp" <= '150599040'::bigint))
>>
>
>
> Those are different queries, so it is not terribly surprising it might
> choose a different plan.
>
> For this type of comparison, you need to compare identical queries,
> including parameter.
>
> Cheers,
>
> Jeff
>


Re: [PERFORM] Slow query in JDBC

2017-09-29 Thread Jeff Janes
On Thu, Sep 28, 2017 at 2:59 AM, Subramaniam C 
wrote:

> First output show the output when the query is executed from sql command
> line. The second output show when it is executed from the application. AS
> per the output it is clear that the when the query is executed through JDBC
> its not using the index (health_index) instead its doing sequence scan.
> Please let us know how this issue can be resolved from JDBC?
>
> 1.)
>
>
> * ->  Index Only Scan
> using health_index on health_timeseries_table  (cost=0.56..421644.56
> rows=1558800 width=24)*
>
> *   Index Cond: (("timestamp" >=
> '1505989186834'::bigint) AND ("timestamp" <= '1505990086834'::bigint))*
>
>

> 2.)
>
>
>   ->  Seq Scan on
> health_timeseries_table  (cost=0.00..267171.00 rows=1005634 width=24)
>
> Filter: (("timestamp" >=
> '150598950'::bigint) AND ("timestamp" <= '150599040'::bigint))
>


Those are different queries, so it is not terribly surprising it might
choose a different plan.

For this type of comparison, you need to compare identical queries,
including parameter.

Cheers,

Jeff


Re: [PERFORM] Slow query in JDBC

2017-09-28 Thread Subramaniam C
5526
>>>> width=16)*
>>>>
>>>> *   ->  Unique  (cost=0.56..425541.56
>>>> rows=55526 width=24)*
>>>>
>>>> * ->  Index Only Scan
>>>> using health_index on health_timeseries_table  (cost=0.56..421644.56
>>>> rows=1558800 width=24)*
>>>>
>>>> *   Index Cond: (("timestamp" >=
>>>> '1505989186834'::bigint) AND ("timestamp" <= '1505990086834'::bigint))*
>>>>
>>>> *LOG:  duration: 1971.697 ms*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2.)
>>>>
>>>>
>>>> Limit  (cost=457629.21..457629.26 rows=20 width=72)
>>>>
>>>>   ->  Sort  (cost=457629.21..458879.21 rows=50 width=72)
>>>>
>>>> Sort Key: health_timeseries_table.health
>>>>
>>>> ->  WindowAgg  (cost=367431.49..444324.39 rows=50 width=72)
>>>>
>>>>   ->  Merge Left Join  (cost=367431.49..438074.39
>>>> rows=50 width=64)
>>>>
>>>> Merge Cond: (object_table.uuid =
>>>> health_timeseries_table.mobid)
>>>>
>>>> ->  Unique  (cost=0.42..57977.00 rows=50
>>>> width=64)
>>>>
>>>>   ->  Index Scan Backward using
>>>> object_table_pkey on object_table  (cost=0.42..56727.00 rows=50
>>>> width=64)
>>>>
>>>> Index Cond: (("timestamp" >=
>>>> '0'::bigint) AND ("timestamp" <= '150599040'::bigint))
>>>>
>>>> Filter: (tenantid = 'perspica'::text)
>>>>
>>>> ->  Materialize  (cost=367431.07..373153.32
>>>> rows=55526 width=16)
>>>>
>>>>   ->  Unique  (cost=367431.07..372459.24
>>>> rows=55526 width=24)
>>>>
>>>> ->  Sort  (cost=367431.07..369945.16
>>>> rows=1005634 width=24)
>>>>
>>>>   Sort Key:
>>>> health_timeseries_table.mobid DESC, health_timeseries_table."timestamp"
>>>> DESC, health_timeseries_table.health
>>>>
>>>>   ->  Seq Scan on
>>>> health_timeseries_table  (cost=0.00..267171.00 rows=1005634 width=24)
>>>>
>>>>
>>>> Filter: (("timestamp" >=
>>>> '150598950'::bigint) AND ("timestamp" <= '150599040'::bigint))
>>>>
>>>> On Thu, Sep 28, 2017 at 2:56 PM, Pavy Philippe <
>>>> philippe.p...@worldline.com> wrote:
>>>>
>>>>> https://www.postgresql.org/docs/current/static/auto-explain.html
>>>>>
>>>>>
>>>>> -Message d'origine-
>>>>> De : pgsql-performance-ow...@postgresql.org [mailto:
>>>>> pgsql-performance-ow...@postgresql.org] De la part de Julien Rouhaud
>>>>> Envoyé : jeudi 28 septembre 2017 11:21
>>>>> À : Subramaniam C
>>>>> Cc : pgsql-performance@postgresql.org
>>>>> Objet : Re: [PERFORM] Slow query in JDBC
>>>>>
>>>>> On Thu, Sep 28, 2017 at 10:58 AM, Subramaniam C <
>>>>> subramaniam31...@gmail.com> wrote:
>>>>> > I configured cursor_tuple_fraction to 1 but still I am facing the
>>>>> same
>>>>> > issue.
>>>>>
>>>>> Can you show explain (analyze, buffers) of the query when run from
>>>>> psql and run from application (you can use auto_explain for that if 
>>>>> needed,
>>>>> see https://www.postgresql.org/docs/current/static/auto-explain.html).
>>>>>
>>>>>
>>>>> --
>>>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.
>>>>> org)
>>>>> To make changes to your subscription:
>>>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>>>
>>>>> !!!*
>>>>> 
>>>>> "Ce message et les pièces jointes sont confidentiels et réservés à
>>>>> l'usage exclusif de ses destinataires. Il peut également être protégé par
>>>>> le secret professionnel. Si vous recevez ce message par erreur, merci d'en
>>>>> avertir immédiatement l'expéditeur et de le détruire. L'intégrité du
>>>>> message ne pouvant être assurée sur Internet, la responsabilité de
>>>>> Worldline ne pourra être recherchée quant au contenu de ce message. Bien
>>>>> que les meilleurs efforts soient faits pour maintenir cette transmission
>>>>> exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard 
>>>>> et
>>>>> sa responsabilité ne saurait être recherchée pour tout dommage résultant
>>>>> d'un virus transmis.
>>>>>
>>>>> This e-mail and the documents attached are confidential and intended
>>>>> solely for the addressee; it may also be privileged. If you receive this
>>>>> e-mail in error, please notify the sender immediately and destroy it. As
>>>>> its integrity cannot be secured on the Internet, the Worldline liability
>>>>> cannot be triggered for the message content. Although the sender 
>>>>> endeavours
>>>>> to maintain a computer virus-free network, the sender does not warrant 
>>>>> that
>>>>> this transmission is virus-free and will not be liable for any damages
>>>>> resulting from any virus transmitted.!!!"
>>>>>
>>>>
>>>>
>>>
>>
>


Re: [PERFORM] Slow query in JDBC

2017-09-28 Thread Dave Cramer
alth
>>>
>>> ->  WindowAgg  (cost=367431.49..444324.39 rows=50 width=72)
>>>
>>>   ->  Merge Left Join  (cost=367431.49..438074.39
>>> rows=50 width=64)
>>>
>>> Merge Cond: (object_table.uuid =
>>> health_timeseries_table.mobid)
>>>
>>> ->  Unique  (cost=0.42..57977.00 rows=50
>>> width=64)
>>>
>>>   ->  Index Scan Backward using
>>> object_table_pkey on object_table  (cost=0.42..56727.00 rows=50
>>> width=64)
>>>
>>> Index Cond: (("timestamp" >=
>>> '0'::bigint) AND ("timestamp" <= '150599040'::bigint))
>>>
>>> Filter: (tenantid = 'perspica'::text)
>>>
>>> ->  Materialize  (cost=367431.07..373153.32
>>> rows=55526 width=16)
>>>
>>>           ->  Unique  (cost=367431.07..372459.24
>>> rows=55526 width=24)
>>>
>>> ->  Sort  (cost=367431.07..369945.16
>>> rows=1005634 width=24)
>>>
>>>   Sort Key:
>>> health_timeseries_table.mobid DESC, health_timeseries_table."timestamp"
>>> DESC, health_timeseries_table.health
>>>
>>>   ->  Seq Scan on
>>> health_timeseries_table  (cost=0.00..267171.00 rows=1005634 width=24)
>>>
>>>
>>> Filter: (("timestamp" >=
>>> '150598950'::bigint) AND ("timestamp" <= '150599040'::bigint))
>>>
>>> On Thu, Sep 28, 2017 at 2:56 PM, Pavy Philippe <
>>> philippe.p...@worldline.com> wrote:
>>>
>>>> https://www.postgresql.org/docs/current/static/auto-explain.html
>>>>
>>>>
>>>> -Message d'origine-
>>>> De : pgsql-performance-ow...@postgresql.org [mailto:
>>>> pgsql-performance-ow...@postgresql.org] De la part de Julien Rouhaud
>>>> Envoyé : jeudi 28 septembre 2017 11:21
>>>> À : Subramaniam C
>>>> Cc : pgsql-performance@postgresql.org
>>>> Objet : Re: [PERFORM] Slow query in JDBC
>>>>
>>>> On Thu, Sep 28, 2017 at 10:58 AM, Subramaniam C <
>>>> subramaniam31...@gmail.com> wrote:
>>>> > I configured cursor_tuple_fraction to 1 but still I am facing the same
>>>> > issue.
>>>>
>>>> Can you show explain (analyze, buffers) of the query when run from psql
>>>> and run from application (you can use auto_explain for that if needed, see
>>>> https://www.postgresql.org/docs/current/static/auto-explain.html).
>>>>
>>>>
>>>> --
>>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.
>>>> org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>>
>>>> !!!*
>>>> 
>>>> "Ce message et les pièces jointes sont confidentiels et réservés à
>>>> l'usage exclusif de ses destinataires. Il peut également être protégé par
>>>> le secret professionnel. Si vous recevez ce message par erreur, merci d'en
>>>> avertir immédiatement l'expéditeur et de le détruire. L'intégrité du
>>>> message ne pouvant être assurée sur Internet, la responsabilité de
>>>> Worldline ne pourra être recherchée quant au contenu de ce message. Bien
>>>> que les meilleurs efforts soient faits pour maintenir cette transmission
>>>> exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et
>>>> sa responsabilité ne saurait être recherchée pour tout dommage résultant
>>>> d'un virus transmis.
>>>>
>>>> This e-mail and the documents attached are confidential and intended
>>>> solely for the addressee; it may also be privileged. If you receive this
>>>> e-mail in error, please notify the sender immediately and destroy it. As
>>>> its integrity cannot be secured on the Internet, the Worldline liability
>>>> cannot be triggered for the message content. Although the sender endeavours
>>>> to maintain a computer virus-free network, the sender does not warrant that
>>>> this transmission is virus-free and will not be liable for any damages
>>>> resulting from any virus transmitted.!!!"
>>>>
>>>
>>>
>>
>


Re: [PERFORM] Slow query in JDBC

2017-09-28 Thread Subramaniam C
t;
>>   ->  Unique  (cost=367431.07..372459.24
>> rows=55526 width=24)
>>
>> ->  Sort  (cost=367431.07..369945.16
>> rows=1005634 width=24)
>>
>>   Sort Key:
>> health_timeseries_table.mobid DESC, health_timeseries_table."timestamp"
>> DESC, health_timeseries_table.health
>>
>>   ->  Seq Scan on
>> health_timeseries_table  (cost=0.00..267171.00 rows=1005634 width=24)
>>
>>
>> Filter: (("timestamp" >=
>> '150598950'::bigint) AND ("timestamp" <= '150599040'::bigint))
>>
>> On Thu, Sep 28, 2017 at 2:56 PM, Pavy Philippe <
>> philippe.p...@worldline.com> wrote:
>>
>>> https://www.postgresql.org/docs/current/static/auto-explain.html
>>>
>>>
>>> -Message d'origine-
>>> De : pgsql-performance-ow...@postgresql.org [mailto:
>>> pgsql-performance-ow...@postgresql.org] De la part de Julien Rouhaud
>>> Envoyé : jeudi 28 septembre 2017 11:21
>>> À : Subramaniam C
>>> Cc : pgsql-performance@postgresql.org
>>> Objet : Re: [PERFORM] Slow query in JDBC
>>>
>>> On Thu, Sep 28, 2017 at 10:58 AM, Subramaniam C <
>>> subramaniam31...@gmail.com> wrote:
>>> > I configured cursor_tuple_fraction to 1 but still I am facing the same
>>> > issue.
>>>
>>> Can you show explain (analyze, buffers) of the query when run from psql
>>> and run from application (you can use auto_explain for that if needed, see
>>> https://www.postgresql.org/docs/current/static/auto-explain.html).
>>>
>>>
>>> --
>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.
>>> org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>
>>> !!!*
>>> 
>>> "Ce message et les pièces jointes sont confidentiels et réservés à
>>> l'usage exclusif de ses destinataires. Il peut également être protégé par
>>> le secret professionnel. Si vous recevez ce message par erreur, merci d'en
>>> avertir immédiatement l'expéditeur et de le détruire. L'intégrité du
>>> message ne pouvant être assurée sur Internet, la responsabilité de
>>> Worldline ne pourra être recherchée quant au contenu de ce message. Bien
>>> que les meilleurs efforts soient faits pour maintenir cette transmission
>>> exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et
>>> sa responsabilité ne saurait être recherchée pour tout dommage résultant
>>> d'un virus transmis.
>>>
>>> This e-mail and the documents attached are confidential and intended
>>> solely for the addressee; it may also be privileged. If you receive this
>>> e-mail in error, please notify the sender immediately and destroy it. As
>>> its integrity cannot be secured on the Internet, the Worldline liability
>>> cannot be triggered for the message content. Although the sender endeavours
>>> to maintain a computer virus-free network, the sender does not warrant that
>>> this transmission is virus-free and will not be liable for any damages
>>> resulting from any virus transmitted.!!!"
>>>
>>
>>
>


Re: [PERFORM] Slow query in JDBC

2017-09-28 Thread Dave Cramer
What version of the driver are you using?

The driver does not automatically use a cursor, but it does use prepared
statements which can be slower.


Can you provide the query and the jdbc query ?



Dave Cramer

da...@postgresintl.com
www.postgresintl.com

On 28 September 2017 at 05:59, Subramaniam C <subramaniam31...@gmail.com>
wrote:

> First output show the output when the query is executed from sql command
> line. The second output show when it is executed from the application. AS
> per the output it is clear that the when the query is executed through JDBC
> its not using the index (health_index) instead its doing sequence scan.
> Please let us know how this issue can be resolved from JDBC?
>
> 1.)
>
>
> *Limit  (cost=510711.53..510711.58 rows=20 width=72)*
>
> *   ->  Sort  (cost=510711.53..511961.53 rows=50 width=72)*
>
> * Sort Key: health_timeseries_table.health*
>
> * ->  WindowAgg  (cost=0.98..497406.71 rows=50 width=72)*
>
> *   ->  Merge Left Join  (cost=0.98..491156.71 rows=50
> width=64)*
>
> * Merge Cond: (object_table.uuid =
> health_timeseries_table.mobid)*
>
> * ->  Unique  (cost=0.42..57977.00 rows=50
> width=64)*
>
> *   ->  Index Scan Backward using
> object_table_pkey on object_table  (cost=0.42..56727.00 rows=50
> width=64)*
>
> * Index Cond: (("timestamp" >= 0) AND
> ("timestamp" <= '1505990086834'::bigint))*
>
> * Filter: (tenantid = 'perspica'::text)*
>
> * ->  Materialize  (cost=0.56..426235.64 rows=55526
> width=16)*
>
> *   ->  Unique  (cost=0.56..425541.56 rows=55526
> width=24)*
>
> * ->  Index Only Scan
> using health_index on health_timeseries_table  (cost=0.56..421644.56
> rows=1558800 width=24)*
>
> *   Index Cond: (("timestamp" >=
> '1505989186834'::bigint) AND ("timestamp" <= '1505990086834'::bigint))*
>
> *LOG:  duration: 1971.697 ms*
>
>
>
>
>
> 2.)
>
>
> Limit  (cost=457629.21..457629.26 rows=20 width=72)
>
>   ->  Sort  (cost=457629.21..458879.21 rows=50 width=72)
>
> Sort Key: health_timeseries_table.health
>
> ->  WindowAgg  (cost=367431.49..444324.39 rows=50 width=72)
>
>   ->  Merge Left Join  (cost=367431.49..438074.39 rows=50
> width=64)
>
> Merge Cond: (object_table.uuid =
> health_timeseries_table.mobid)
>
> ->  Unique  (cost=0.42..57977.00 rows=50 width=64)
>
>   ->  Index Scan Backward using object_table_pkey
> on object_table  (cost=0.42..56727.00 rows=50 width=64)
>
> Index Cond: (("timestamp" >= '0'::bigint)
> AND ("timestamp" <= '150599040'::bigint))
>
> Filter: (tenantid = 'perspica'::text)
>
> ->  Materialize  (cost=367431.07..373153.32 rows=55526
> width=16)
>
>   ->  Unique  (cost=367431.07..372459.24
> rows=55526 width=24)
>
> ->  Sort  (cost=367431.07..369945.16
> rows=1005634 width=24)
>
>   Sort Key:
> health_timeseries_table.mobid DESC, health_timeseries_table."timestamp"
> DESC, health_timeseries_table.health
>
>   ->  Seq Scan on
> health_timeseries_table  (cost=0.00..267171.00 rows=1005634 width=24)
>
>
> Filter: (("timestamp" >=
> '150598950'::bigint) AND ("timestamp" <= '150599040'::bigint))
>
> On Thu, Sep 28, 2017 at 2:56 PM, Pavy Philippe <
> philippe.p...@worldline.com> wrote:
>
>> https://www.postgresql.org/docs/current/static/auto-explain.html
>>
>>
>> -Message d'origine-
>> De : pgsql-performance-ow...@postgresql.org [mailto:
>> pgsql-performance-ow...@postgresql.org] De la part de Julien Rouhaud
>> Envoyé : jeudi 28 septembre 2017 11:21
>> À : Subramaniam C
>> Cc : pgsql-performance@postgresql.org
>> Objet : Re: [PERFORM] Slow query in JDBC
>>
>> On Thu, Sep 28, 2017 at 10:58 AM, Subramaniam C <
>> subramaniam31...@gmail.com> wrote:
>> > I configured cursor_tuple_fraction to 1 but still I am facing the same
>> > issue.
>>
>> Can you show explain (analyze, buffers) of the qu

Re: [PERFORM] Slow query in JDBC

2017-09-28 Thread Subramaniam C
First output show the output when the query is executed from sql command
line. The second output show when it is executed from the application. AS
per the output it is clear that the when the query is executed through JDBC
its not using the index (health_index) instead its doing sequence scan.
Please let us know how this issue can be resolved from JDBC?

1.)


*Limit  (cost=510711.53..510711.58 rows=20 width=72)*

*   ->  Sort  (cost=510711.53..511961.53 rows=50 width=72)*

* Sort Key: health_timeseries_table.health*

* ->  WindowAgg  (cost=0.98..497406.71 rows=50 width=72)*

*   ->  Merge Left Join  (cost=0.98..491156.71 rows=50
width=64)*

* Merge Cond: (object_table.uuid =
health_timeseries_table.mobid)*

* ->  Unique  (cost=0.42..57977.00 rows=50
width=64)*

*   ->  Index Scan Backward using object_table_pkey
on object_table  (cost=0.42..56727.00 rows=50 width=64)*

* Index Cond: (("timestamp" >= 0) AND
("timestamp" <= '1505990086834'::bigint))*

* Filter: (tenantid = 'perspica'::text)*

* ->  Materialize  (cost=0.56..426235.64 rows=55526
width=16)*

*   ->  Unique  (cost=0.56..425541.56 rows=55526
width=24)*

* ->  Index Only Scan using health_index on
health_timeseries_table  (cost=0.56..421644.56 rows=1558800 width=24)*

*   Index Cond: (("timestamp" >=
'1505989186834'::bigint) AND ("timestamp" <= '1505990086834'::bigint))*

*LOG:  duration: 1971.697 ms*





2.)


Limit  (cost=457629.21..457629.26 rows=20 width=72)

  ->  Sort  (cost=457629.21..458879.21 rows=50 width=72)

Sort Key: health_timeseries_table.health

->  WindowAgg  (cost=367431.49..444324.39 rows=50 width=72)

  ->  Merge Left Join  (cost=367431.49..438074.39 rows=50
width=64)

Merge Cond: (object_table.uuid =
health_timeseries_table.mobid)

->  Unique  (cost=0.42..57977.00 rows=50 width=64)

  ->  Index Scan Backward using object_table_pkey
on object_table  (cost=0.42..56727.00 rows=50 width=64)

Index Cond: (("timestamp" >= '0'::bigint)
AND ("timestamp" <= '150599040'::bigint))

Filter: (tenantid = 'perspica'::text)

->  Materialize  (cost=367431.07..373153.32 rows=55526
width=16)

  ->  Unique  (cost=367431.07..372459.24 rows=55526
width=24)

->  Sort  (cost=367431.07..369945.16
rows=1005634 width=24)

  Sort Key:
health_timeseries_table.mobid DESC, health_timeseries_table."timestamp"
DESC, health_timeseries_table.health

  ->  Seq Scan on
health_timeseries_table  (cost=0.00..267171.00 rows=1005634 width=24)


Filter: (("timestamp" >=
'150598950'::bigint) AND ("timestamp" <= '150599040'::bigint))

On Thu, Sep 28, 2017 at 2:56 PM, Pavy Philippe <philippe.p...@worldline.com>
wrote:

> https://www.postgresql.org/docs/current/static/auto-explain.html
>
>
> -Message d'origine-
> De : pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] De la part de Julien Rouhaud
> Envoyé : jeudi 28 septembre 2017 11:21
> À : Subramaniam C
> Cc : pgsql-performance@postgresql.org
> Objet : Re: [PERFORM] Slow query in JDBC
>
> On Thu, Sep 28, 2017 at 10:58 AM, Subramaniam C <
> subramaniam31...@gmail.com> wrote:
> > I configured cursor_tuple_fraction to 1 but still I am facing the same
> > issue.
>
> Can you show explain (analyze, buffers) of the query when run from psql
> and run from application (you can use auto_explain for that if needed, see
> https://www.postgresql.org/docs/current/static/auto-explain.html).
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
> !!!*
> 
> "Ce message et les pièces jointes sont confidentiels et réservés à l'usage
> exclusif de ses destinataires. Il peut également être protégé par le secret
> professionnel. Si vous recevez ce message par erreur, merci d'en avertir
> immédiatement l'expéditeur et de le détruire. L'intégrité du message ne
> pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra
> êtr

Re: [PERFORM] Slow query in JDBC

2017-09-28 Thread Pavy Philippe
https://www.postgresql.org/docs/current/static/auto-explain.html


-Message d'origine-
De : pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] De la part de Julien Rouhaud
Envoyé : jeudi 28 septembre 2017 11:21
À : Subramaniam C
Cc : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] Slow query in JDBC

On Thu, Sep 28, 2017 at 10:58 AM, Subramaniam C <subramaniam31...@gmail.com> 
wrote:
> I configured cursor_tuple_fraction to 1 but still I am facing the same
> issue.

Can you show explain (analyze, buffers) of the query when run from psql and run 
from application (you can use auto_explain for that if needed, see 
https://www.postgresql.org/docs/current/static/auto-explain.html).


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

!!!*
"Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.!!!"

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query in JDBC

2017-09-28 Thread Julien Rouhaud
On Thu, Sep 28, 2017 at 10:58 AM, Subramaniam C
 wrote:
> I configured cursor_tuple_fraction to 1 but still I am facing the same
> issue.

Can you show explain (analyze, buffers) of the query when run from
psql and run from application (you can use auto_explain for that if
needed, see https://www.postgresql.org/docs/current/static/auto-explain.html).


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query in JDBC

2017-09-28 Thread Subramaniam C
I configured cursor_tuple_fraction to 1 but still I am facing the same
issue.
Please help.

On Thu, Sep 28, 2017 at 2:18 PM, Julien Rouhaud  wrote:

> On Thu, Sep 28, 2017 at 10:19 AM, Subramaniam C
>  wrote:
> > Hi
> >
> > When I try to execute the query from sql command line then that query is
> > taking only around 1 sec. But when I execute the query using JDBC(Java)
> > using preparedStatement then the same query is taking around 10 secs.
> >
> > Can you please let us know the reason and how to fix this issue?
>
>
> I think jdbc always uses cursor, which can be problematic with default
> configuration, because postgres will try to generate plans that
> returns fast the first rows but not all the rows .  Can you try to
> configure cursor_tuple_fraction to 1 and see if that fixes your issue?
>


Re: [PERFORM] Slow query in JDBC

2017-09-28 Thread Julien Rouhaud
On Thu, Sep 28, 2017 at 10:19 AM, Subramaniam C
 wrote:
> Hi
>
> When I try to execute the query from sql command line then that query is
> taking only around 1 sec. But when I execute the query using JDBC(Java)
> using preparedStatement then the same query is taking around 10 secs.
>
> Can you please let us know the reason and how to fix this issue?


I think jdbc always uses cursor, which can be problematic with default
configuration, because postgres will try to generate plans that
returns fast the first rows but not all the rows .  Can you try to
configure cursor_tuple_fraction to 1 and see if that fixes your issue?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query in JDBC

2017-09-28 Thread Subramaniam C
Hi

When I try to execute the query from sql command line then that query is
taking only around 1 sec. But when I execute the query using JDBC(Java)
using preparedStatement then the same query is taking around 10 secs.

Can you please let us know the reason and how to fix this issue?

Thanks and Regards
Subramaniam


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2017-07-27 Thread Scott Marlowe
On Tue, Dec 27, 2016 at 3:50 PM, Flávio Henrique  wrote:
> Hi there, fellow experts!
>
> I need an advice with query that became slower after 9.3 to 9.6 migration.
>
> First of all, I'm from the dev team.
>
> Before migration, we (programmers) made some modifications on query bring
> it's average time from 8s to 2-3s.
>
> As this query is the most executed on our system (it builds the user panel
> to work), every bit that we can squeeze from it will be nice.
>
> Now, after server migration to 9.6 we're experiencing bad times with this
> query again.
>
> Unfortunately, I don't have the old query plain (9.3 version) to show you,
> but in the actual version (9.6) I can see some buffers written that tells me
> that something is wrong.
>
> Our server has 250GB of memory available, but the database team says that
> they can't do nothing to make this query better. I'm not sure, as some
> buffers are written on disk.

The second sorts etc start spilling to disk your performance is gonna
tank. Try increasing work_mem to something moderate like 256M to 1G.
Note that work_mem is per sort / action, so if you got 100 users
running queries with 2 or 3 sorts at a time you can exhaust memory
real fast. OTOH, a db with proper pooling on connections etc (aka 10
to 20 live connections at a time) cna easily handle 1G work_mem if
it's got 256G RAM


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2017-07-26 Thread Peter Geoghegan
On Thu, Jan 5, 2017 at 9:51 AM, Daniel Blanch Bataller
 wrote:
> If just recreating the index now it uses it, it might mean that the index
> was bloated, that is, it grew so big that it was cheaper a seq scan.
>
> I’ve seen another case recently where postgres 9.6 wasn’t using the right
> index in a query, I was able to reproduce the issue crafting index bigger,
> much bigger than it should be.
>
> Can you record index size as it is now? Keep this info, and If problem
> happens again check indexes size, and see if they have grow too much.
>
> i.e. SELECT relname, relpages, reltuples FROM pg_class WHERE relname =
> ‘index_name'
>
> This might help to see if this is the problem, that indexes are growing too
> much for some reason.

Are these unique indexes or not? Did Flavio have a workload with many UPDATEs?

I ask these questions because I think it's possible that this is
explained by a regression in 9.5's handling of index bloat, described
here:

http://postgr.es/m/CAH2-Wz=sfakvmv1x9jh19ej8am8tzn9f-yecips9hrrrqss...@mail.gmail.com

I'm trying to track down cases where this could be an issue, to get a
better sense of the problem.

-- 
Peter Geoghegan


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with 3 table joins

2017-04-26 Thread Matthew Bellew
This looks like the same optimizer problem that occasionally plagues our
customers.  Whenever the estimated rows of a join==1, but the actual rows
is higher, the optimizer may choose very poor plans.  I made some attempts
to fix.  The very simple fix is to never estimate 1 for a join result.
Even using 2 works remarkably well as a defense against this problem.


https://github.com/labkey-matthewb/postgres/commit/b1fd99f4deffbbf3db2172ccaba51a34f18d1b1a

I also made a much more correct but complicated patch to track both
uniqueness and selectivity thought the optimizer, but I didn't quite push
that over the finish line (I made a mistake in the hash join code, and got
distracted by my day job before finishing it).

  https://github.com/labkey-matthewb/postgres/commits/struct_selectivity

The second path is certainly better approach, but needs someone to pick up
the mission.

Matt

On Wed, Apr 26, 2017 at 8:00 AM, Gerardo Herzig <gher...@fmed.uba.ar> wrote:

> Some other approaches you could try:
>
> 1) What about an hashed index? You could make
> CREATE INDEX ON FIELD (unit_id, hashtext(field_name))
>
> and changing your query accordingly
>
> "where hashtext(FIELD.FIELD_NAME)=hashtext('SHEETS_PRESENT') "
>
> 2) Partitioning (not native yet, but can be simulated through
> inheritance), like in
> https://www.postgresql.org/docs/current/static/ddl-partitioning.html
> This could work well if you have a sort of limited different values in
> FIELD.FIELD_NAME
>
> Gerardo
>
> - Mensaje original -
> > De: "Alessandro Ferrucci" <alessandroferru...@gmail.com>
> > Para: pgsql-performance@postgresql.org
> > Enviados: Miércoles, 26 de Abril 2017 0:19:37
> > Asunto: Re: [PERFORM] Slow query with 3 table joins
> >
> >
> >
> > After about 40 inutes the slow query finally finished and the result
> > of the EXPLAIN plan can be found here:
> >
> >
> > https://explain.depesz.com/s/BX22
> >
> >
> > Thanks,
> > Alessandro Ferrucci
> >
> >
> > On Tue, Apr 25, 2017 at 11:10 PM, Alessandro Ferrucci <
> > alessandroferru...@gmail.com > wrote:
> >
> >
> >
> >
> > Hello - I am migrating a current system to PostgreSQL and I am having
> > an issue with a relatively straightforward query being extremely
> > slow.
> >
> >
> > The following are the definitions of the tables:
> >
> >
> > CREATE TABLE popt_2017.unit
> > (
> > id serial NOT NULL,
> > unit_id text,
> > batch_id text,
> > create_date timestamp without time zone DEFAULT now(),
> > update_date timestamp without time zone,
> > CONSTRAINT unit_pkey PRIMARY KEY (id)
> > )
> > WITH (
> > OIDS=FALSE
> > );
> >
> >
> > CREATE TABLE popt_2017.field
> > (
> > id serial NOT NULL,
> > unit_id integer,
> > subunit_data_id integer,
> > field_name character varying(50),
> > page_id character varying(20),
> > page_type character varying(20),
> > batch_id character varying(20),
> > file_name character varying(20),
> > data_concept integer,
> > "GROUP" integer,
> > omr_group integer,
> > pres integer,
> > reg_data text,
> > ocr_conf text,
> > ocr_dict text,
> > ocr_phon text,
> > create_date timestamp without time zone DEFAULT now(),
> > update_date timestamp without time zone,
> > CONSTRAINT field_pkey PRIMARY KEY (id),
> > CONSTRAINT field_subunit_data_id_fkey FOREIGN KEY (subunit_data_id)
> > REFERENCES popt_2017.subunit (id) MATCH SIMPLE
> > ON UPDATE NO ACTION ON DELETE NO ACTION,
> > CONSTRAINT field_unit_id_fk FOREIGN KEY (unit_id)
> > REFERENCES popt_2017.unit (id) MATCH FULL
> > ON UPDATE NO ACTION ON DELETE NO ACTION,
> > CONSTRAINT field_unit_id_fkey FOREIGN KEY (unit_id)
> > REFERENCES popt_2017.unit (id) MATCH SIMPLE
> > ON UPDATE NO ACTION ON DELETE NO ACTION
> > )
> > WITH (
> > OIDS=FALSE
> > );
> >
> >
> > CREATE TABLE popt_2017.answer
> > (
> > id serial NOT NULL,
> > field_id integer,
> > ans_status integer,
> > ans text,
> > luggage text,
> > arec text,
> > kfi_partition integer,
> > final boolean,
> > length integer,
> > create_date timestamp without time zone DEFAULT now(),
> > update_date timestamp without time zone,
> > CONSTRAINT answer_pkey PRIMARY KEY (id),
> > CONSTRAINT answer_field_id_fk FOREIGN KEY (field_id)
> > REFERENCES popt_2017.field (id) MATCH FULL
> > ON UPDATE NO ACTION ON DELETE NO ACTION,
> > CONSTRAINT answer_field_id_fkey FOREIGN KEY (field_id)

Re: [PERFORM] Slow query with 3 table joins

2017-04-26 Thread Alessandro Ferrucci
Dave -

I had re-ran ANALYZE and VACUUM on all the tables and that did not seem to
have fixed the issue (the query still took a long time, however I did not
let it finish to produce a full EXPLAIN plan.

However - after creating an index on FIELD(FIELD_NAME,UNIT_ID) and now the
query runs very fast ( I changed the FIELD_NAME clause so I would not run
into
any caching ).

The new query is (notice the new FIELD_NAME value to avoid caching).

EXPLAIN (ANALYZE,BUFFERS) SELECT
   UNIT.ID AS UNIT_ID,
   UNIT.UNIT_ID AS UNIT_UNIT_ID,
   UNIT.BATCH_ID AS UNIT_BATCH_ID,
   UNIT.CREATE_DATE AS UNIT_CREATE_DATE,
   UNIT.UPDATE_DATE AS UNIT_UPDATE_DATE
FROM
   UNIT, FIELD, ANSWER
WHERE
   UNIT.ID=FIELD.UNIT_ID AND
   FIELD.ID=ANSWER.FIELD_ID AND
   FIELD.FIELD_NAME='RESP_PH_PREFIX_ID' AND
   ANSWER.ANS='2';

You can find the EXPLAIN plan here:

https://explain.depesz.com/s/apYR

I believe this fixes the issue as far as I can see.  I'm going to play
around with it more and see how it goes.

I wanted to also answer your question as to how many different values there
are for the FIELD_NAME (and how many rows of each value there are)

here is it:

SELECT FIELD_NAME,COUNT(*) FROM FIELD GROUP BY FIELD_NAME ORDER BY COUNT;

"PAGE_SERIAL";10315
"SHEETS_PRESENT";10315
"RESP_PH_AREA_ID";10556
"RESP_PH_PREFIX_ID";10559
"RESP_PH_SUFFIX_ID";10560
"H_LOC_ADD_NO_IND";10587
"H_TENURE_RENTED_IND";11162
"H_TENURE_OWNED_MORT_IND";11199
"H_TENURE_OWNED_FREE_IND";11208
"PAPER_JIC_2_TEXT";11413
"PAPER_JIC_1_TEXT";11413
"H_TENURE_OCC_NOPAY_IND";11478
"H_LOC_CHILDREN_IND";11496
"H_LOC_RELATIVES_IND";11496
"H_LOC_TEMPORARY_IND";11500
"H_LOC_NONRELATIVES_IND";11510
"PSEUDO_FIELD_MARGINALIA";87744
"H_SIZE_STATED_INT";207918
"P_REL_NO_IND";825240
"P_REL_YES_IND";825240
"P_REL_OTHER_NONREL_IND";1239894
"P_REL_CHILD_ADOPTED_IND";1239894
"P_REL_CHILD_BIO_IND";1239894
"P_REL_CHILD_FOSTER_IND";1239894
"P_REL_CHILD_STEP_IND";1239894
"P_REL_GRANDCHILD_IND";1239894
"P_REL_HOUSEROOMMATE_IND";1239894
"P_REL_INLAW_CHILD_IND";1239894
"P_REL_INLAW_PARENT_IND";1239894
"P_REL_OTHER_REL_IND";1239894
"P_REL_PARENT_IND";1239894
"P_REL_PARTNER_OPP_IND";1239894
"P_REL_PARTNER_SAME_IND";1239894
"P_REL_SIBLING_IND";1239894
"P_REL_SPOUSE_OPP_IND";1239894
"P_REL_SPOUSE_SAME_IND";1239894
"P_TRBSHR_CORP_NAME";1446204
"P_TRBSHR_YES_IND";1446204
"P_TRBSHR_NO_IND";1446204
"P_LOC_ELSE_COLLEGE_IND";1446204
"P_LOC_ELSE_JAIL_IND";1446204
"P_LOC_ELSE_JOB_IND";1446204
"P_LOC_ELSE_MILITARY_IND";1446204
"P_LOC_ELSE_NO_IND";1446204
"P_TRBENR_YES_IND";1446204
"P_LOC_ELSE_SEASONAL_IND";1446204
"P_LOC_ELSE_NURSINGHOME_IND";1446204
"P_TRBENR_TRIBE_NAME";1446204
"P_TRBENR_NO_IND";1446204
"P_LOC_ELSE_RELATIVES_IND";1446204
"P_LOC_ELSE_OTHER_IND";1446204
"P_RACE_WHITE_IND";1447812
"P_RACE2_TONGAN_IND";1447812
"P_RACE2_AFAM_IND";1447812
"P_RACE2_AIAN_TEXT";1447812
"P_RACE2_ASIANINDIAN_IND";1447812
"P_RACE2_ASIAN_TEXT";1447812
"P_RACE2_BLACK_TEXT";1447812
"P_RACE2_CHAMORRO_IND";1447812
"P_RACE2_CHINESE_IND";1447812
"P_RACE2_COLOMBIAN_IND";1447812
"P_RACE2_CUBAN_IND";1447812
"P_RACE2_DOMINICAN_IND";1447812
"P_RACE2_EGYPTIAN_IND";1447812
"P_RACE2_ENGLISH_IND";1447812
"P_RACE2_ETHIOPIAN_IND";1447812
"P_RACE2_FIJIAN_IND";1447812
"P_RACE2_FILIPINO_IND";1447812
"P_RACE2_FRENCH_IND";1447812
"P_RACE2_GERMAN_IND";1447812
"P_RACE2_HAITIAN_IND";1447812
"P_RACE2_HISP_TEXT";1447812
"P_RACE2_IRANIAN_IND";1447812
"P_RACE2_IRISH_IND";1447812
"P_RACE2_ISRAELI_IND";1447812
"P_RACE2_ITALIAN_IND";1447812
"P_RACE2_JAMAICAN_IND";1447812
"P_RACE2_JAPANESE_IND";1447812
"P_RACE2_KOREAN_IND";1447812
"P_RACE2_LEBANESE_IND";1447812
"P_RACE2_MARSHALLESE_IND";1447812
"P_RACE2_MENA_TEXT";1447812
"P_RACE2_MEXICAN_IND";1447812
"P_RACE2_MOROCCAN_IND";1447812
"P_RACE2_NATHAWAIIAN_IND";1447812
"P_RACE2_NHPI_TEXT";1447812
"P_RACE2_NIGERIAN_IND";1447812
"P_RACE2_POLISH_IND";1447812
"P_RACE2_PUERTORICAN_IND";1447812
"P_RACE2_SALVADORAN_IND";1447812
"P_RACE2_SAMOAN_IND";1447812
"P_RACE2_SOMALI_IND";1447812
"P_RACE2_SOR_TEXT";1447812
"P_RACE2_SYRIAN_IND";1447812
"P_RACE2_VIETNAMESE_IND";1447812
"P_RACE2_WHITE_TEXT";1447812
"P_RACE_AIAN_IND";1447812
"P_RACE_ASIAN_IND";1447812
"P_RACE_BLACK_IND";1447812
"P_RACE_HISP_IND";1447812
"P_RACE_MENA_IND";1447812
"P_RACE_NHPI_IND";1447812
"P_RACE_SOR_IND";1447812
"P_SEX_MALE_IND";2273052
"P_SEX_FEMALE_IND";2273052
"P_MIDDLE_NAME";2273052
"P_LAST_NAME";2273052
"P_FIRST_NAME";2273052
"P_BIRTH_YEAR_INT";2273052
"P_BIRTH_MONTH_INT";2273052
"P_BIRTH_DAY_INT";2273052
"P_AGE_INT";2273052


I want to give a HUGE thanks to everyone who took the time to look at my
issue and provide insight and assistance, you folks are truly awesome!



On Wed, Apr 26, 2017 at 12:12 AM, David Rowley  wrote:

> On 26 April 2017 at 15:19, Alessandro Ferrucci
>  wrote:
> > After about 40 inutes the slow query finally finished and the result of
> the
> > EXPLAIN plan can be found here:
> >
> > https://explain.depesz.com/s/BX22
>
> > Index Scan using 

Re: [PERFORM] Slow query with 3 table joins

2017-04-26 Thread Gerardo Herzig
Some other approaches you could try:

1) What about an hashed index? You could make 
CREATE INDEX ON FIELD (unit_id, hashtext(field_name))

and changing your query accordingly

"where hashtext(FIELD.FIELD_NAME)=hashtext('SHEETS_PRESENT') "

2) Partitioning (not native yet, but can be simulated through inheritance), 
like in
https://www.postgresql.org/docs/current/static/ddl-partitioning.html
This could work well if you have a sort of limited different values in 
FIELD.FIELD_NAME

Gerardo

- Mensaje original -
> De: "Alessandro Ferrucci" <alessandroferru...@gmail.com>
> Para: pgsql-performance@postgresql.org
> Enviados: Miércoles, 26 de Abril 2017 0:19:37
> Asunto: Re: [PERFORM] Slow query with 3 table joins
> 
> 
> 
> After about 40 inutes the slow query finally finished and the result
> of the EXPLAIN plan can be found here:
> 
> 
> https://explain.depesz.com/s/BX22
> 
> 
> Thanks,
> Alessandro Ferrucci
> 
> 
> On Tue, Apr 25, 2017 at 11:10 PM, Alessandro Ferrucci <
> alessandroferru...@gmail.com > wrote:
> 
> 
> 
> 
> Hello - I am migrating a current system to PostgreSQL and I am having
> an issue with a relatively straightforward query being extremely
> slow.
> 
> 
> The following are the definitions of the tables:
> 
> 
> CREATE TABLE popt_2017.unit
> (
> id serial NOT NULL,
> unit_id text,
> batch_id text,
> create_date timestamp without time zone DEFAULT now(),
> update_date timestamp without time zone,
> CONSTRAINT unit_pkey PRIMARY KEY (id)
> )
> WITH (
> OIDS=FALSE
> );
> 
> 
> CREATE TABLE popt_2017.field
> (
> id serial NOT NULL,
> unit_id integer,
> subunit_data_id integer,
> field_name character varying(50),
> page_id character varying(20),
> page_type character varying(20),
> batch_id character varying(20),
> file_name character varying(20),
> data_concept integer,
> "GROUP" integer,
> omr_group integer,
> pres integer,
> reg_data text,
> ocr_conf text,
> ocr_dict text,
> ocr_phon text,
> create_date timestamp without time zone DEFAULT now(),
> update_date timestamp without time zone,
> CONSTRAINT field_pkey PRIMARY KEY (id),
> CONSTRAINT field_subunit_data_id_fkey FOREIGN KEY (subunit_data_id)
> REFERENCES popt_2017.subunit (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> CONSTRAINT field_unit_id_fk FOREIGN KEY (unit_id)
> REFERENCES popt_2017.unit (id) MATCH FULL
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> CONSTRAINT field_unit_id_fkey FOREIGN KEY (unit_id)
> REFERENCES popt_2017.unit (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
> OIDS=FALSE
> );
> 
> 
> CREATE TABLE popt_2017.answer
> (
> id serial NOT NULL,
> field_id integer,
> ans_status integer,
> ans text,
> luggage text,
> arec text,
> kfi_partition integer,
> final boolean,
> length integer,
> create_date timestamp without time zone DEFAULT now(),
> update_date timestamp without time zone,
> CONSTRAINT answer_pkey PRIMARY KEY (id),
> CONSTRAINT answer_field_id_fk FOREIGN KEY (field_id)
> REFERENCES popt_2017.field (id) MATCH FULL
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> CONSTRAINT answer_field_id_fkey FOREIGN KEY (field_id)
> REFERENCES popt_2017.field (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
> OIDS=FALSE
> );
> 
> 
> Below are the index definitions for those tables:
> 
> 
> UNIT:
> CREATE UNIQUE INDEX unit_pkey ON unit USING btree (id);
> CREATE INDEX unit_unit_id_idx ON unit USING btree (unit_id);
> 
> 
> FIELD:
> CREATE UNIQUE INDEX field_pkey ON field USING btree (id)
> CREATE INDEX field_unit_id_idx ON field USING btree (unit_id)
> CREATE INDEX field_subunit_id_idx ON field USING btree
> (subunit_data_id)
> CREATE INDEX field_field_name_idx ON field USING btree (field_name)
> 
> 
> ANSWER:
> CREATE UNIQUE INDEX answer_pkey ON answer USING btree (id)
> CREATE INDEX answer_field_id_idx ON answer USING btree (field_id)
> CREATE INDEX answer_ans_idx ON answer USING btree (ans)
> 
> 
> The tables each have the following number of rows:
> 
> 
> UNIT: 10,315
> FIELD: 139,397,965
> ANSWER: 3,463,300
> 
> 
> The query in question is:
> 
> 
> SELECT
> UNIT.ID AS UNIT_ID,
> UNIT.UNIT_ID AS UNIT_UNIT_ID,
> UNIT.BATCH_ID AS UNIT_BATCH_ID,
> UNIT.CREATE_DATE AS UNIT_CREATE_DATE,
> UNIT.UPDATE_DATE AS UNIT_UPDATE_DATE
> FROM
> UNIT, FIELD, ANSWER
> WHERE
> UNIT.ID =FIELD.UNIT_ID AND
> FIELD.ID =ANSWER.FIELD_ID AND
> FIELD.FIELD_NAME='SHEETS_PRESENT' AND
> ANSWER.ANS='2';
> 
> 
> I attempted to run an EXPLAIN (ANALYZE,BUFFERS) and the 

Re: [PERFORM] Slow query with 3 table joins

2017-04-26 Thread Gerardo Herzig


- Mensaje original -
> De: "Alessandro Ferrucci" <alessandroferru...@gmail.com>
> Para: pgsql-performance@postgresql.org
> Enviados: Miércoles, 26 de Abril 2017 0:19:37
> Asunto: Re: [PERFORM] Slow query with 3 table joins
> 
> 
> 
> After about 40 inutes the slow query finally finished and the result
> of the EXPLAIN plan can be found here:
> 
> 
> https://explain.depesz.com/s/BX22
> 
> 
> Thanks,
> Alessandro Ferrucci

1) Looking at the "Rows removed by filter" in that explain, looks like a 
selectivity issue: Many (many many) rows are fetched, just to be rejected 
later. 
I think you can try a partial index on ''field (unit_id) where 
field_name="SHEETS_PRESENT"'', if it is practical to you.
See https://www.postgresql.org/docs/current/static/indexes-partial.html for a 
good read about partial indexes.

2) 9.2 is a pretty old version of PG. If you are migrating yet, you should 
consider a more recent version

HTH

Gerardo


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with 3 table joins

2017-04-26 Thread Johan Fredriksson

> Hi Eskil - 
> 
> 
> The I believe the id-field you're referring to is the UNIT.UNIT_ID, I
> could change this to a varchar, however that column is not used in the
> query in question, so that wouldn't have any effect on the query's
> performance.

Sorry, I did not notice that the column "unit_id" existed in both "unit"
and "field" tables.

/ Eskil





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with 3 table joins

2017-04-26 Thread Alessandro Ferrucci
Hi Dave -

thank you very much for all this advice!  I will try each of these and post
back results (some of this stuff, like creating the index, which is
happening now, takes a very long time).

Thanks again for all these pointers.

Cheers,
Alessandro

On Wed, Apr 26, 2017 at 12:12 AM, David Rowley  wrote:

> On 26 April 2017 at 15:19, Alessandro Ferrucci
>  wrote:
> > After about 40 inutes the slow query finally finished and the result of
> the
> > EXPLAIN plan can be found here:
> >
> > https://explain.depesz.com/s/BX22
>
> > Index Scan using field_unit_id_idx on field  (cost=0.00..8746678.52
> rows=850149 width=8) (actual time=0.030..2414345.998 rows=10315 loops=1)"
>
> This estimate seems a long way off. Are the stats up-to-date on the
> table? Try again after running: ANALYZE field;
>
> It might also be a good idea to ANALYZE all the tables. Is auto-vacuum
> switched on?
>
> The plan in question would work better if you create an index on field
> (field_name, unit_id);
>
> but I think if you update the stats the plan will switch.
>
> A HashJoin, hashing "unit"  and index scanning on field_field_name_idx
> would have been a much smarter plan choice for the planner to make.
>
> Also how many distinct field_names are there? SELECT COUNT(DISTINCT
> field_name) FROM field;
>
> You may want to increase the histogram buckets on that columns if
> there are more than 100 field names, and the number of rows with each
> field name is highly variable. ALTER TABLE field ALTER COLUMN
> field_name SET STATISTICS ; 100 is the default, and 1
> is the maximum.
>
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
Signed,
Alessandro Ferrucci


Re: [PERFORM] Slow query with 3 table joins

2017-04-26 Thread Alessandro Ferrucci
Hi Eskil -

The I believe the id-field you're referring to is the UNIT.UNIT_ID, I could
change this to a varchar, however that column is not used in the query in
question, so that wouldn't have any effect on the query's performance.

Just for curiosity - I have changed the ANSWER.ANS datatype to a
varchar(250), but that did not affect the performance of the query.



On Wed, Apr 26, 2017 at 2:24 AM, Johan Fredriksson  wrote:

> tis 2017-04-25 klockan 23:19 -0400 skrev Alessandro Ferrucci:
> > After about 40 inutes the slow query finally finished and the result
> > of the EXPLAIN plan can be found here:
> >
> >
> > https://explain.depesz.com/s/BX22
> >
> >
> > Thanks,
> > Alessandro Ferrucci
>
> I'm not so familiar with the index implementetion in Postgres, but I
> don't think it is very efficient to index a text-field. It also loooks a
> bit strange that a id-field has the datatype "text" rather than integer
> or varchar.
>
> / Eskil
>
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
Signed,
Alessandro Ferrucci


Re: [PERFORM] Slow query with 3 table joins

2017-04-26 Thread Johan Fredriksson
tis 2017-04-25 klockan 23:19 -0400 skrev Alessandro Ferrucci:
> After about 40 inutes the slow query finally finished and the result
> of the EXPLAIN plan can be found here:
> 
> 
> https://explain.depesz.com/s/BX22
> 
> 
> Thanks,
> Alessandro Ferrucci

I'm not so familiar with the index implementetion in Postgres, but I
don't think it is very efficient to index a text-field. It also loooks a
bit strange that a id-field has the datatype "text" rather than integer
or varchar.

/ Eskil





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with 3 table joins

2017-04-25 Thread David Rowley
On 26 April 2017 at 15:19, Alessandro Ferrucci
 wrote:
> After about 40 inutes the slow query finally finished and the result of the
> EXPLAIN plan can be found here:
>
> https://explain.depesz.com/s/BX22

> Index Scan using field_unit_id_idx on field  (cost=0.00..8746678.52 
> rows=850149 width=8) (actual time=0.030..2414345.998 rows=10315 loops=1)"

This estimate seems a long way off. Are the stats up-to-date on the
table? Try again after running: ANALYZE field;

It might also be a good idea to ANALYZE all the tables. Is auto-vacuum
switched on?

The plan in question would work better if you create an index on field
(field_name, unit_id);

but I think if you update the stats the plan will switch.

A HashJoin, hashing "unit"  and index scanning on field_field_name_idx
would have been a much smarter plan choice for the planner to make.

Also how many distinct field_names are there? SELECT COUNT(DISTINCT
field_name) FROM field;

You may want to increase the histogram buckets on that columns if
there are more than 100 field names, and the number of rows with each
field name is highly variable. ALTER TABLE field ALTER COLUMN
field_name SET STATISTICS ; 100 is the default, and 1
is the maximum.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query with 3 table joins

2017-04-25 Thread Alessandro Ferrucci
Hello - I am migrating a current system to PostgreSQL and I am having an
issue with a relatively straightforward query being extremely slow.

The following are the definitions of the tables:

CREATE TABLE popt_2017.unit
(
  id serial NOT NULL,
  unit_id text,
  batch_id text,
  create_date timestamp without time zone DEFAULT now(),
  update_date timestamp without time zone,
  CONSTRAINT unit_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

CREATE TABLE popt_2017.field
(
  id serial NOT NULL,
  unit_id integer,
  subunit_data_id integer,
  field_name character varying(50),
  page_id character varying(20),
  page_type character varying(20),
  batch_id character varying(20),
  file_name character varying(20),
  data_concept integer,
  "GROUP" integer,
  omr_group integer,
  pres integer,
  reg_data text,
  ocr_conf text,
  ocr_dict text,
  ocr_phon text,
  create_date timestamp without time zone DEFAULT now(),
  update_date timestamp without time zone,
  CONSTRAINT field_pkey PRIMARY KEY (id),
  CONSTRAINT field_subunit_data_id_fkey FOREIGN KEY (subunit_data_id)
  REFERENCES popt_2017.subunit (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT field_unit_id_fk FOREIGN KEY (unit_id)
  REFERENCES popt_2017.unit (id) MATCH FULL
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT field_unit_id_fkey FOREIGN KEY (unit_id)
  REFERENCES popt_2017.unit (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);

CREATE TABLE popt_2017.answer
(
  id serial NOT NULL,
  field_id integer,
  ans_status integer,
  ans text,
  luggage text,
  arec text,
  kfi_partition integer,
  final boolean,
  length integer,
  create_date timestamp without time zone DEFAULT now(),
  update_date timestamp without time zone,
  CONSTRAINT answer_pkey PRIMARY KEY (id),
  CONSTRAINT answer_field_id_fk FOREIGN KEY (field_id)
  REFERENCES popt_2017.field (id) MATCH FULL
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT answer_field_id_fkey FOREIGN KEY (field_id)
  REFERENCES popt_2017.field (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);

Below are the index definitions for those tables:

UNIT:
CREATE UNIQUE INDEX unit_pkey ON unit USING btree (id);
CREATE INDEX unit_unit_id_idx ON unit USING btree (unit_id);

FIELD:
CREATE UNIQUE INDEX field_pkey ON field USING btree (id)
CREATE INDEX field_unit_id_idx ON field USING btree (unit_id)
CREATE INDEX field_subunit_id_idx ON field USING btree (subunit_data_id)
CREATE INDEX field_field_name_idx ON field USING btree (field_name)

ANSWER:
CREATE UNIQUE INDEX answer_pkey ON answer USING btree (id)
CREATE INDEX answer_field_id_idx ON answer USING btree (field_id)
CREATE INDEX answer_ans_idx ON answer USING btree (ans)

The tables each have the following number of rows:

UNIT: 10,315
FIELD: 139,397,965
ANSWER: 3,463,300

The query in question is:

SELECT
   UNIT.ID AS UNIT_ID,
   UNIT.UNIT_ID AS UNIT_UNIT_ID,
   UNIT.BATCH_ID AS UNIT_BATCH_ID,
   UNIT.CREATE_DATE AS UNIT_CREATE_DATE,
   UNIT.UPDATE_DATE AS UNIT_UPDATE_DATE
FROM
   UNIT, FIELD, ANSWER
WHERE
   UNIT.ID=FIELD.UNIT_ID AND
   FIELD.ID=ANSWER.FIELD_ID AND
   FIELD.FIELD_NAME='SHEETS_PRESENT' AND
   ANSWER.ANS='2';

I attempted to run an EXPLAIN (ANALYZE,BUFFERS) and the query has been
running for 32 minutes now, So I won't be able to post the results (as I've
never been able to get the query to actually finish.

But, if I remove the join to UNIT (and just join FIELD and ANSWER) the
resulting query is sufficiently fast, (the first time it ran in roughly 3
seconds), the query as such is:

SELECT * FROM
   ANSWER, FIELD
WHERE
   FIELD.ID=ANSWER.FIELD_ID AND
   FIELD.FIELD_NAME='SHEETS_PRESENT' AND
   ANSWER.ANS='2';

The EXPLAIN ( ANALYZE, BUFFERS ) output of that query can be found here
https://explain.depesz.com/s/ueJq

These tables are static for now, so they do not get DELETEs or INSERTS at
all and I have run VACUUM ANALYZE on all the affected tables.

I'm running PostgreSQL PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu,
compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit

I'm running this on RHEL 6.9

On a server with 32 GB of ram, 2 CPUs.

The following are the changes to postgresql.conf that I have made:

shared_buffers = 7871MB
effective_cache_size = 23611MB
work_mem = 1000MB
maintenance_work_mem = 2048MB

I have not changed the autovacuum settings, but since the tables are static
for now and I've already ran VACUUM that should not have any effect.

Any assistance that could be provided is greatly appreciated.

Thank you,
Alessandro Ferrucci


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2017-01-06 Thread Filipe Oliveira
Thank you for the reply. I had been trying to find that option for awhile
now.

On Fri, Jan 6, 2017 at 12:51 PM, Michael Paquier 
wrote:

> On Fri, Jan 6, 2017 at 6:14 AM, Filipe Oliveira 
> wrote:
> > Can you remove me from your mailing list?
>
> There is an unsubscribe action here:
> https://www.postgresql.org/community/lists/subscribe/
> --
> Michael
>


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2017-01-06 Thread Michael Paquier
On Fri, Jan 6, 2017 at 6:14 AM, Filipe Oliveira  wrote:
> Can you remove me from your mailing list?

There is an unsubscribe action here:
https://www.postgresql.org/community/lists/subscribe/
-- 
Michael


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2017-01-05 Thread Filipe Oliveira
Can you remove me from your mailing list?

Thanks.


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2017-01-05 Thread Merlin Moncure
On Thu, Jan 5, 2017 at 10:51 AM, Flávio Henrique  wrote:
> @Merlin Moncure
>>
>> Big gains (if any) are likely due to indexing strategy.
>> I do see some suspicious casting, for example:
>> Join Filter: ((four_charlie.delta_tango)::integer =
>> (six_quebec.golf_bravo)::integer)
>> Are you casting in the query or joining through dissimilar data types?
>
> No casts in query. The joins are on same data types.

well, something is going on.

create table t(i int);
create table t2(i int);
set enable_hashjoin to false;
set enable_mergejoin to false;

yields:

postgres=# explain select * from t join t2 on t.i = t2.i;
QUERY PLAN
──
 Nested Loop  (cost=0.00..97614.88 rows=32512 width=8)
   Join Filter: (t.i = t2.i)
   ->  Seq Scan on t  (cost=0.00..35.50 rows=2550 width=4)
   ->  Materialize  (cost=0.00..48.25 rows=2550 width=4)
 ->  Seq Scan on t2  (cost=0.00..35.50 rows=2550 width=4)

please note the non-casted join filter.

however,

postgres=# explain select * from t join t2 on t.i::bigint = t2.i::bigint;
QUERY PLAN
──
 Nested Loop  (cost=0.00..130127.38 rows=32512 width=8)
   Join Filter: ((t.i)::bigint = (t2.i)::bigint)
   ->  Seq Scan on t  (cost=0.00..35.50 rows=2550 width=4)
   ->  Materialize  (cost=0.00..48.25 rows=2550 width=4)
 ->  Seq Scan on t2  (cost=0.00..35.50 rows=2550 width=4)

notice the casts in the join filter.   Furthermore, please note the
higher  query cost due to the server accounting for the casting
involved in the join.  Any kind of non-equality based operation in a
join or the predicate side of a where condition can get very expensive
very quickly.   (it remains difficult to see if there's any way to
improve the join operation due to lack of visibility on the query
string).

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2017-01-05 Thread Daniel Blanch Bataller
Hi,

If just recreating the index now it uses it, it might mean that the index was 
bloated, that is, it grew so big that it was cheaper a seq scan.

I’ve seen another case recently where postgres 9.6 wasn’t using the right index 
in a query, I was able to reproduce the issue crafting index bigger, much 
bigger than it should be. 

Can you record index size as it is now? Keep this info, and If problem happens 
again check indexes size, and see if they have grow too much.

i.e. SELECT relname, relpages, reltuples FROM pg_class WHERE relname = 
‘index_name'

This might help to see if this is the problem, that indexes are growing too 
much for some reason.

Regards.

P.S the other parameters don't seem to be the cause of the problem to me.

> El 5 ene 2017, a las 17:51, Flávio Henrique  escribió:
> 
> Hi all!
> Sorry the delay (holidays).
> 
> Well, the most expensive sequencial scan was solved.
> I asked the db team to drop the index and recreate it and guess what: now 
> postgresql is using it and the time dropped.
> (thank you, @Gerardo Herzig!)
> 
> I think there's still room for improvement, but the problem is not so crucial 
> right now.
> I'll try to investigate every help mentioned here. Thank you all.
> 
> @Daniel Blanch
> I'll make some tests with a materialized view. Thank you.
> On systems side: ask them if they have not changed anything in 
> effective_cache_size and shared_buffers parameters, I presume they haven’t 
> change anything related to costs.
> Replying your comment, I think they tunned the server:
> effective_cache_size = 196GB
> shared_buffers = 24GB (this shouldn't be higher?)
> 
> @Kevin Grittner
> sorry, but I'm not sure when the autovacuum is aggressive enough, but here my 
> settings related:
> autovacuum  |on
> autovacuum_analyze_scale_factor |0.05  
> autovacuum_analyze_threshold|10
> autovacuum_freeze_max_age   |2 
> autovacuum_max_workers  |3 
> autovacuum_multixact_freeze_max_age |4 
> autovacuum_naptime  |15s   
> autovacuum_vacuum_cost_delay|10ms  
> autovacuum_vacuum_cost_limit|-1
> autovacuum_vacuum_scale_factor  |0.1   
> autovacuum_vacuum_threshold |10
> autovacuum_work_mem |-1
> 
> @Merlin Moncure
> Big gains (if any) are likely due to indexing strategy.
> I do see some suspicious casting, for example:
> Join Filter: ((four_charlie.delta_tango)::integer =
> (six_quebec.golf_bravo)::integer)
> Are you casting in the query or joining through dissimilar data types?
> No casts in query. The joins are on same data types. 
> 
> Thank you all for the answers. Happy 2017!
> 
> Flávio Henrique
> 
> "There are only 10 types of people in the world: Those who understand binary, 
> and those who don't"
> 
> 
> On Thu, Jan 5, 2017 at 12:40 PM, Merlin Moncure  > wrote:
> On Tue, Dec 27, 2016 at 5:50 PM, Flávio Henrique  > wrote:
> > Hi there, fellow experts!
> >
> > I need an advice with query that became slower after 9.3 to 9.6 migration.
> >
> > First of all, I'm from the dev team.
> >
> > Before migration, we (programmers) made some modifications on query bring
> > it's average time from 8s to 2-3s.
> >
> > As this query is the most executed on our system (it builds the user panel
> > to work), every bit that we can squeeze from it will be nice.
> >
> > Now, after server migration to 9.6 we're experiencing bad times with this
> > query again.
> >
> > Unfortunately, I don't have the old query plain (9.3 version) to show you,
> > but in the actual version (9.6) I can see some buffers written that tells me
> > that something is wrong.
> >
> > Our server has 250GB of memory available, but the database team says that
> > they can't do nothing to make this query better. I'm not sure, as some
> > buffers are written on disk.
> >
> > Any tip/help will be much appreciated (even from the query side).
> >
> > Thank you!
> >
> > The query plan: https://explain.depesz.com/s/5KMn 
> > 
> >
> > Note: I tried to add index on kilo_victor table already, but Postgresql
> > still thinks that is better to do a seq scan.
> 
> Hard to provide more without the query or the 'old' plan.   Here are
> some things you can try:
> *) Set effective_io_concurrency high.You have some heap scanning
> going on and this can sometimes help (but it should be marginal).
> *) See if you can get any juice out of parallel query
> *) try playing with enable_nestloop and enable_seqscan.   these are
> hail mary passes but worth a shot.
> 
> Run the query back to back with same arguments in the same database
> session. Does performance improve?
> 
> Big gains (if any) 

Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2017-01-05 Thread Kevin Grittner
On Thu, Jan 5, 2017 at 10:51 AM, Flávio Henrique  wrote:

> Replying your comment, I think they tunned the server:
> effective_cache_size = 196GB
> shared_buffers = 24GB (this shouldn't be higher?)

Probably not, although it may be a good idea to try settings either
side of that (say, 16GB and 32GB) and monitor performance compared
to the current setting.

> autovacuum_max_workers  |3

If you ever see all workers busy at the same time for 30 minutes or
more, you should probably consider raising that so that small,
frequently updated tables are not neglected for too long.

> autovacuum_vacuum_cost_limit|-1

That is going to default to vacuum_cost_limit, which is usually
200.  If the server is actually big enough to merit
"effective_cache_size = 196GB" then you should probably bump this
setting to something like 2000.

> autovacuum_work_mem |-1

That is going to default to maintenance_work_mem.  On a big
machine, you probably want that set to somewhere between 1GB and
2GB.

Some other tuning to the cost parameters might be helpful, but
there's not enough data on the thread to know what else to suggest.
If you hit some other slow query, you might want to report it in
the manner suggested here:

https://wiki.postgresql.org/wiki/SlowQueryQuestions

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2017-01-05 Thread Flávio Henrique
Hi all!
Sorry the delay (holidays).

Well, the most expensive sequencial scan was solved.
I asked the db team to drop the index and recreate it and guess what: now
postgresql is using it and the time dropped.
(thank you, @Gerardo Herzig!)

I think there's still room for improvement, but the problem is not so
crucial right now.
I'll try to investigate every help mentioned here. Thank you all.

@Daniel Blanch
I'll make some tests with a materialized view. Thank you.

> On systems side: ask them if they have not changed anything in
> effective_cache_size and shared_buffers parameters, I presume they haven’t
> change anything related to costs.

Replying your comment, I think they tunned the server:
effective_cache_size = 196GB
shared_buffers = 24GB (this shouldn't be higher?)

@Kevin Grittner
sorry, but I'm not sure when the autovacuum is aggressive enough, but here
my settings related:
autovacuum  |on
autovacuum_analyze_scale_factor |0.05
autovacuum_analyze_threshold|10
autovacuum_freeze_max_age   |2
autovacuum_max_workers  |3
autovacuum_multixact_freeze_max_age |4
autovacuum_naptime  |15s
autovacuum_vacuum_cost_delay|10ms
autovacuum_vacuum_cost_limit|-1
autovacuum_vacuum_scale_factor  |0.1
autovacuum_vacuum_threshold |10
autovacuum_work_mem |-1

@Merlin Moncure

> Big gains (if any) are likely due to indexing strategy.
> I do see some suspicious casting, for example:
> Join Filter: ((four_charlie.delta_tango)::integer =
> (six_quebec.golf_bravo)::integer)
> Are you casting in the query or joining through dissimilar data types?

No casts in query. The joins are on same data types.

Thank you all for the answers. Happy 2017!

Flávio Henrique

"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


On Thu, Jan 5, 2017 at 12:40 PM, Merlin Moncure  wrote:

> On Tue, Dec 27, 2016 at 5:50 PM, Flávio Henrique 
> wrote:
> > Hi there, fellow experts!
> >
> > I need an advice with query that became slower after 9.3 to 9.6
> migration.
> >
> > First of all, I'm from the dev team.
> >
> > Before migration, we (programmers) made some modifications on query bring
> > it's average time from 8s to 2-3s.
> >
> > As this query is the most executed on our system (it builds the user
> panel
> > to work), every bit that we can squeeze from it will be nice.
> >
> > Now, after server migration to 9.6 we're experiencing bad times with this
> > query again.
> >
> > Unfortunately, I don't have the old query plain (9.3 version) to show
> you,
> > but in the actual version (9.6) I can see some buffers written that
> tells me
> > that something is wrong.
> >
> > Our server has 250GB of memory available, but the database team says that
> > they can't do nothing to make this query better. I'm not sure, as some
> > buffers are written on disk.
> >
> > Any tip/help will be much appreciated (even from the query side).
> >
> > Thank you!
> >
> > The query plan: https://explain.depesz.com/s/5KMn
> >
> > Note: I tried to add index on kilo_victor table already, but Postgresql
> > still thinks that is better to do a seq scan.
>
> Hard to provide more without the query or the 'old' plan.   Here are
> some things you can try:
> *) Set effective_io_concurrency high.You have some heap scanning
> going on and this can sometimes help (but it should be marginal).
> *) See if you can get any juice out of parallel query
> *) try playing with enable_nestloop and enable_seqscan.   these are
> hail mary passes but worth a shot.
>
> Run the query back to back with same arguments in the same database
> session. Does performance improve?
>
> Big gains (if any) are likely due to indexing strategy.
> I do see some suspicious casting, for example:
>
> Join Filter: ((four_charlie.delta_tango)::integer =
> (six_quebec.golf_bravo)::integer)
>
> Are you casting in the query or joining through dissimilar data types?
>  I suspect your database team might be incorrect.
>
> merlin
>


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2017-01-05 Thread Merlin Moncure
On Tue, Dec 27, 2016 at 5:50 PM, Flávio Henrique  wrote:
> Hi there, fellow experts!
>
> I need an advice with query that became slower after 9.3 to 9.6 migration.
>
> First of all, I'm from the dev team.
>
> Before migration, we (programmers) made some modifications on query bring
> it's average time from 8s to 2-3s.
>
> As this query is the most executed on our system (it builds the user panel
> to work), every bit that we can squeeze from it will be nice.
>
> Now, after server migration to 9.6 we're experiencing bad times with this
> query again.
>
> Unfortunately, I don't have the old query plain (9.3 version) to show you,
> but in the actual version (9.6) I can see some buffers written that tells me
> that something is wrong.
>
> Our server has 250GB of memory available, but the database team says that
> they can't do nothing to make this query better. I'm not sure, as some
> buffers are written on disk.
>
> Any tip/help will be much appreciated (even from the query side).
>
> Thank you!
>
> The query plan: https://explain.depesz.com/s/5KMn
>
> Note: I tried to add index on kilo_victor table already, but Postgresql
> still thinks that is better to do a seq scan.

Hard to provide more without the query or the 'old' plan.   Here are
some things you can try:
*) Set effective_io_concurrency high.You have some heap scanning
going on and this can sometimes help (but it should be marginal).
*) See if you can get any juice out of parallel query
*) try playing with enable_nestloop and enable_seqscan.   these are
hail mary passes but worth a shot.

Run the query back to back with same arguments in the same database
session. Does performance improve?

Big gains (if any) are likely due to indexing strategy.
I do see some suspicious casting, for example:

Join Filter: ((four_charlie.delta_tango)::integer =
(six_quebec.golf_bravo)::integer)

Are you casting in the query or joining through dissimilar data types?
 I suspect your database team might be incorrect.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2017-01-04 Thread Kevin Grittner
On Tue, Dec 27, 2016 at 5:50 PM, Flávio Henrique  wrote:

> I can see some buffers written that tells me
> that something is wrong.

Try running VACUUM FREEZE ANALYZE on all tables involved in the
query (or just run it as a superuser on the whole database).  Do
*not* use the FULL option.  Among other things, this will ensure
that you have somewhat current statistics, and that all hint bits
are set.  (I remember my surprise the first time I converted a
table to PostgreSQL, ran SELECT count(*) on it to make sure all
rows made it, saw a very long run time with disk writes as the
bottleneck.  That's when I learned about hint bits.)

You should also make sure that autovacuum is aggressive enough on
the new cluster.  Without that, any performance benefit from the
above will slowly disappear.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2016-12-28 Thread Daniel Blanch Bataller
The biggest impact on performance you can achieve is by using a materialized 
view. if it’s so heavily used as you said, even 2-3 seconds in a multiuser OLTP 
environment still unacceptable under my point of view. I don’t know if this is 
the case but if you have 1000 users connecting at 8 am all at the same time … 
it will freeze the app for a while ..

Ask your self: how old data can be? and take into account that you can refresh 
the materialized view as often as you want, even every 10 secs if you want.

Beides this, there there's still some room for improvement. Perhaps you have 
not created the right index to avoid seq scans. Have a look at indexes on 
expressions.

On systems side: ask them if they have not changed anything in 
effective_cache_size and shared_buffers parameters, I presume they haven’t 
change anything related to costs.

Regards.

Daniel Blanch.


> El 28 dic 2016, a las 0:50, Flávio Henrique  escribió:
> 
> Hi there, fellow experts!
> 
> I need an advice with query that became slower after 9.3 to 9.6 migration.
> 
> First of all, I'm from the dev team.
> 
> Before migration, we (programmers) made some modifications on query bring 
> it's average time from 8s to 2-3s.
> 
> As this query is the most executed on our system (it builds the user panel to 
> work), every bit that we can squeeze from it will be nice.
> 
> Now, after server migration to 9.6 we're experiencing bad times with this 
> query again.
> 
> Unfortunately, I don't have the old query plain (9.3 version) to show you, 
> but in the actual version (9.6) I can see some buffers written that tells me 
> that something is wrong.
> 
> Our server has 250GB of memory available, but the database team says that 
> they can't do nothing to make this query better. I'm not sure, as some 
> buffers are written on disk.
> 
> Any tip/help will be much appreciated (even from the query side).
> 
> Thank you!
> 
> The query plan: https://explain.depesz.com/s/5KMn 
> 
> 
> Note: I tried to add index on kilo_victor table already, but Postgresql still 
> thinks that is better to do a seq scan.
> 
> 
> Flávio Henrique



Re: [PERFORM] Slow query after 9.3 to 9.6 migration

2016-12-27 Thread Gerardo Herzig
> 
> Hi there, fellow experts!
> 
> 
> I need an advice with query that became slower after 9.3 to 9.6
> migration.
> 
> 
> First of all, I'm from the dev team.
> 
> 
> Before migration, we (programmers) made some modifications on query
> bring it's average time from 8s to 2-3s.
> 
> 
> As this query is the most executed on our system (it builds the user
> panel to work), every bit that we can squeeze from it will be nice.
> 
> 
> Now, after server migration to 9.6 we're experiencing bad times with
> this query again.
> 
> 
> Unfortunately, I don't have the old query plain (9.3 version) to show
> you, but in the actual version (9.6) I can see some buffers written
> that tells me that something is wrong.
> 
> 
> Our server has 250GB of memory available, but the database team says
> that they can't do nothing to make this query better. I'm not sure,
> as some buffers are written on disk.
> 
> 
> Any tip/help will be much appreciated (even from the query side).
> 
> 
> Thank you!
> 
> 
> The query plan: https://explain.depesz.com/s/5KMn
> 
> 
> Note: I tried to add index on kilo_victor table already, but
> Postgresql still thinks that is better to do a seq scan.
> 
> 

I dont know about the data distribution in kilo_victor, but maybe a partial 
index
ON kilo_victor (juliet_romeo) where not xray_seven
?

Gerardo


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query after 9.3 to 9.6 migration

2016-12-27 Thread Flávio Henrique
Hi there, fellow experts!

I need an advice with query that became slower after 9.3 to 9.6 migration.

First of all, I'm from the dev team.

Before migration, we (programmers) made some modifications on query bring
it's average time from 8s to 2-3s.

As this query is the most executed on our system (it builds the user panel
to work), every bit that we can squeeze from it will be nice.

Now, after server migration to 9.6 we're experiencing bad times with this
query again.

Unfortunately, I don't have the old query plain (9.3 version) to show you,
but in the actual version (9.6) I can see some buffers written that tells
me that something is wrong.

Our server has 250GB of memory available, but the database team says that
they can't do nothing to make this query better. I'm not sure, as some
buffers are written on disk.

Any tip/help will be much appreciated (even from the query side).

Thank you!

The query plan: https://explain.depesz.com/s/5KMn

Note: I tried to add index on kilo_victor table already, but Postgresql
still thinks that is better to do a seq scan.


Flávio Henrique


Re: [PERFORM] Slow query question

2016-12-07 Thread Andrey Povazhnyi
Tom,

Thank you for a thorough answer. We’ll try the 2-column index.

Regards,
Andrey Povazhnyi

> On Dec 6, 2016, at 6:33 PM, Tom Lane  wrote:
> 
> Andrey Povazhnyi  writes:
>> We’ve got a strange planner behavior on a query to one of our bigger tables 
>> after we upgraded to postgres 9.6.1 recently.
> 
> The basic problem with this query is that there are no good alternatives.
> The planner believes there are about 53K rows matching the WHERE
> condition.  (I assume this estimate is roughly in line with reality,
> else we have different problems to talk about.)  It can either scan down
> the "id" index and stop when it finds the 30th row matching WHERE, or
> it can use the "symbol" index to read all 53K rows matching WHERE and
> then sort them by "id".  Neither one of those is going to be speedy;
> but the more rows there are matching WHERE, the better the first way
> is going to look.
> 
> If you're worried about doing this a lot, it might be worth your while
> to provide a 2-column index on (source, id) --- in that order --- which
> would allow a query plan that directly finds the required 30 rows as
> consecutive index entries.  Possibly this could replace your index on
> "source" alone, depending on how much bigger the 2-col index is and
> how many queries have no use for the second column.  See
> https://www.postgresql.org/docs/current/static/indexes.html
> particularly 11.3 - 11.5.
> 
>   regards, tom lane



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query question

2016-12-06 Thread Tom Lane
Andrey Povazhnyi  writes:
> We’ve got a strange planner behavior on a query to one of our bigger tables 
> after we upgraded to postgres 9.6.1 recently.

The basic problem with this query is that there are no good alternatives.
The planner believes there are about 53K rows matching the WHERE
condition.  (I assume this estimate is roughly in line with reality,
else we have different problems to talk about.)  It can either scan down
the "id" index and stop when it finds the 30th row matching WHERE, or
it can use the "symbol" index to read all 53K rows matching WHERE and
then sort them by "id".  Neither one of those is going to be speedy;
but the more rows there are matching WHERE, the better the first way
is going to look.

If you're worried about doing this a lot, it might be worth your while
to provide a 2-column index on (source, id) --- in that order --- which
would allow a query plan that directly finds the required 30 rows as
consecutive index entries.  Possibly this could replace your index on
"source" alone, depending on how much bigger the 2-col index is and
how many queries have no use for the second column.  See
https://www.postgresql.org/docs/current/static/indexes.html
particularly 11.3 - 11.5.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query question

2016-12-06 Thread Andrey Povazhnyi
Hello, List.

We’ve got a strange planner behavior on a query to one of our bigger tables 
after we upgraded to postgres 9.6.1 recently.
The table (schema here http://pastebin.com/nRAny4bw 
) has 28m+ rows and is used to store chat 
messages for different chat rooms (symbol is the room id).
The query is as follows:
SELECT "tv_site_chathistory"."source" FROM "tv_site_chathistory" WHERE 
"tv_site_chathistory"."symbol" = ’pm_OmoGVzBdyPnpYkXD' ORDER BY 
"tv_site_chathistory"."id" DESC LIMIT 30;
(explain analyze is here https://explain.depesz.com/s/iyT 
)

For some reason planner chooses to scan using pkey index instead of index on 
symbol column. Most times it uses the right index, but for this particular 
‘symbol’ value is resorts to pkey scan. One possible clue could be that last 30 
rows with this particular symbol are spanning some relatively large time of 
creation.

Any advice would be greatly appreciated!

Re: [PERFORM] Slow query with big tables

2016-08-29 Thread Tommi Kaksonen
On Sat, Aug 27, 2016 at 18:33 GMT+03:00, Jeff Janes
 wrote:

> Partitioning the Feature and Point tables on measurement_time (or
> measurement_start_time,
> you are not consistent on what it is called) might be helpful.  However,
> measurement_time does not exist in those tables, so you would first have
to
> de-normalize by introducing it into them.
>
> More likely to be helpful would be precomputing the aggregates and storing
> them in a materialized view (not available in 9.2).   Also, more RAM and
> better hard-drives can't hurt.

Thanks a lot for help and all suggestions. Before this I tried to partition
by measurement_id (Feature table) and by result_feature_id (Point table)
but the performance was worse than without partitioning. Using
measurement_time in partitioning might be a better idea
(measurement_start_time was meant to be measurement_time).

I think I will update to newer version, use better hardware and try
materialized views for better performance.

Best Regards,
Tommi Kaksonen


2016-08-27 21:33 GMT+03:00 Jeff Janes :

> On Fri, Aug 26, 2016 at 6:17 AM, Tommi K  wrote:
>
>> Hello,
>> thanks for the response. I did not get the response to my email even
>> though I am subscribed to the pgsql-performance mail list. Let's hope that
>> I get the next one :)
>>
>> Increasing work_mem did not have great impact on the performance. But I
>> will try to update the PostgreSQL version to see if it speeds up things.
>>
>> However is there way to keep query time constant as the database size
>> grows.
>>
>
> Not likely.  If the number of rows you are aggregating grows, it will take
> more work to do those aggregations.
>
> If the number of rows being aggregated doesn't grow, because all the
> growth occurs outside of the measurement_time range, even then the new
> data will still make it harder to keep the stuff you want cached in
> memory.  If you really want more-constant query time, you could approach
> that by giving the machine as little RAM as possible.  This works not by
> making the large database case faster, but by making the small database
> case slower.  That usually is not what people want.
>
>
>
>> Should I use partitioning or partial indexes?
>>
>
> Partitioning the Feature and Point tables on measurement_time (or 
> measurement_start_time,
> you are not consistent on what it is called) might be helpful.  However,
> measurement_time does not exist in those tables, so you would first have
> to de-normalize by introducing it into them.
>
> More likely to be helpful would be precomputing the aggregates and storing
> them in a materialized view (not available in 9.2).   Also, more RAM and
> better hard-drives can't hurt.
>
> Cheers,
>
> Jeff
>


Re: [PERFORM] Slow query with big tables

2016-08-27 Thread Jeff Janes
On Fri, Aug 26, 2016 at 6:17 AM, Tommi K  wrote:

> Hello,
> thanks for the response. I did not get the response to my email even
> though I am subscribed to the pgsql-performance mail list. Let's hope that
> I get the next one :)
>
> Increasing work_mem did not have great impact on the performance. But I
> will try to update the PostgreSQL version to see if it speeds up things.
>
> However is there way to keep query time constant as the database size
> grows.
>

Not likely.  If the number of rows you are aggregating grows, it will take
more work to do those aggregations.

If the number of rows being aggregated doesn't grow, because all the growth
occurs outside of the measurement_time range, even then the new data will
still make it harder to keep the stuff you want cached in memory.  If you
really want more-constant query time, you could approach that by giving the
machine as little RAM as possible.  This works not by making the large
database case faster, but by making the small database case slower.  That
usually is not what people want.



> Should I use partitioning or partial indexes?
>

Partitioning the Feature and Point tables on measurement_time (or
measurement_start_time,
you are not consistent on what it is called) might be helpful.  However,
measurement_time does not exist in those tables, so you would first have to
de-normalize by introducing it into them.

More likely to be helpful would be precomputing the aggregates and storing
them in a materialized view (not available in 9.2).   Also, more RAM and
better hard-drives can't hurt.

Cheers,

Jeff


Re: [PERFORM] Slow query with big tables

2016-08-27 Thread Jeff Janes
On Sat, Aug 27, 2016 at 7:13 AM, Craig James  wrote:

> On Fri, Aug 26, 2016 at 9:11 PM, Jim Nasby 
> wrote:
>
>> On 8/26/16 3:26 PM, Mike Sofen wrote:
>>
>>> Is there way to keep query time constant as the database size grows.
>>>
>>
>> No. More data == more time. Unless you find a way to break the laws of
>> physics.
>>
>
> Straight hash-table indexes (which Postgres doesn't use) have O(1) access
> time.
>

But he isn't doing single-row lookups, he is doing large aggregations.  If
you have to aggregate N rows, doing a O(1) operation on different N
occasions is still O(N).

Not that big-O is useful here anyway.  It assumes that either everything
fits in RAM (and is already there), or that nothing fits in RAM and it all
has to be fetched from disk, even the index root pages, every time it is
needed.  Tommi is not operating under an environment where the first
assumption holds, and no one operates in an environment where the second
assumption holds.

As N increases beyond available RAM, your actual time for a single look-up
is going to be a weighted average of two different constant-time
operations, one with a small constant and one with a large constant.  Big-O
notation ignores this nicety and assumes all operations are at the slower
speed, because that is what the limit of the weighted average will be as N
gets very large. But real world systems do not operate at the infinite
limit.

So his run time could easily be proportional to N^2, if he aggregates more
rows and each one of them is less likely to be a cache hit.

Cheers,

Jeff


Re: [PERFORM] Slow query with big tables

2016-08-27 Thread Tom Lane
Craig James  writes:
> Straight hash-table indexes (which Postgres doesn't use) have O(1) access
> time. The amount of data has no effect on the access time.

This is wishful thinking --- once you have enough data, O(1) goes out the
window.  For example, a hash index is certainly not going to continue to
scale linearly once you reach its maximum possible number of buckets
(2^N for N-bit hashes, and remember you can't get very many useful hash
bits out of small objects like integers).  But even before that, large
numbers of buckets put enough stress on your storage system that you will
see some not very O(1)-ish behavior, just because too little of the index
fits in whatever cache and RAM you have.  Any storage hierarchy is
ultimately going to impose O(log N) access costs, that's the way they're
built.

I think it's fairly pointless to discuss such matters in the abstract.
If you want to make useful engineering tradeoffs you have to talk about
specific data sets and available hardware.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with big tables

2016-08-27 Thread Craig James
On Fri, Aug 26, 2016 at 9:11 PM, Jim Nasby  wrote:

> On 8/26/16 3:26 PM, Mike Sofen wrote:
>
>> Is there way to keep query time constant as the database size grows.
>>
>
> No. More data == more time. Unless you find a way to break the laws of
> physics.
>

Straight hash-table indexes (which Postgres doesn't use) have O(1) access
time. The amount of data has no effect on the access time. Postgres uses
B-trees which have O(logN) performance. There is no "law of physics" that
says Postgres couldn't employ pure hash indexes.

Craig


> Should I use partitioning or partial indexes?
>>
>
> Neither technique is a magic bullet. I doubt either would help here.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
-
Craig A. James
Chief Technology Officer
eMolecules, Inc.
-


Re: [PERFORM] Slow query with big tables

2016-08-26 Thread Pavel Stehule
2016-08-26 22:26 GMT+02:00 Mike Sofen <mso...@runbox.com>:

>
>
> *From:* pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] *On Behalf Of *Tommi K
> *Sent:* Friday, August 26, 2016 7:25 AM
> *To:* Craig James <cja...@emolecules.com>
> *Cc:* andreas kretschmer <akretsch...@spamfence.net>;
> pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Slow query with big tables
>
>
>
> Ok, sorry that I did not add the original message. I thought that it would
> be automatically added to the message thread.
>
>
>
> Here is the question again:
>
>
>
> Is there way to keep query time constant as the database size grows.
> Should I use partitioning or partial indexes?
>

try to disable nested_loop - there are bad estimations.

This query should not be fast - there are two ILIKE filters with negative
impact on estimations.

Regards

Pavel


>
>
> Thanks,
>
> Tommi Kaksonen
>
>
>
>
>
>
>
> > Hello,
>
> >
>
> > I have the following tables and query. I would like to get some help to
> find out why it is slow and how its performance could be improved.
>
> >
>
> > Thanks,
>
> > Tommi K.
>
> >
>
> >
>
> > --Table definitions---
>
> > CREATE TABLE "Measurement"
>
> > (
>
> >   id bigserial NOT NULL,
>
> >   product_id bigserial NOT NULL,
>
> >   nominal_data_id bigserial NOT NULL,
>
> >   description text,
>
> >   serial text,
>
> >   measurement_time timestamp without time zone,
>
> >   status smallint,
>
> >   system_description text,
>
> >   CONSTRAINT "Measurement_pkey" PRIMARY KEY (id),
>
> >   CONSTRAINT "Measurement_nominal_data_id_fkey" FOREIGN KEY
> (nominal_data_id)
>
> >   REFERENCES "Nominal_data" (id) MATCH SIMPLE
>
> >   ON UPDATE NO ACTION ON DELETE NO ACTION,
>
> >   CONSTRAINT "Measurement_product_id_fkey" FOREIGN KEY (product_id)
>
> >   REFERENCES "Product" (id) MATCH SIMPLE
>
> >   ON UPDATE NO ACTION ON DELETE NO ACTION
>
> > )
>
> > WITH (
>
> >   OIDS=FALSE
>
> > );
>
> >
>
> > CREATE INDEX measurement_time_index
>
> >   ON "Measurement"
>
> >   USING btree
>
> >   (measurement_time);
>
> > ALTER TABLE "Measurement" CLUSTER ON measurement_time_index;
>
> >
>
> > CREATE TABLE "Product"
>
> > (
>
> >   id bigserial NOT NULL,
>
> >   name text,
>
> >   description text,
>
> >   info text,
>
> >   system_name text,
>
> >   CONSTRAINT "Product_pkey" PRIMARY KEY (id)
>
> > )
>
> > WITH (
>
> >   OIDS=FALSE
>
> > );
>
> >
>
> >
>
> > CREATE TABLE "Extra_info"
>
> > (
>
> >   id bigserial NOT NULL,
>
> >   measurement_id bigserial NOT NULL,
>
> >   name text,
>
> >   description text,
>
> >   info text,
>
> >   type text,
>
> >   value_string text,
>
> >   value_double double precision,
>
> >   value_integer bigint,
>
> >   value_bool boolean,
>
> >   CONSTRAINT "Extra_info_pkey" PRIMARY KEY (id),
>
> >   CONSTRAINT "Extra_info_measurement_id_fkey" FOREIGN KEY
> (measurement_id)
>
> >   REFERENCES "Measurement" (id) MATCH SIMPLE
>
> >   ON UPDATE NO ACTION ON DELETE NO ACTION
>
> > )
>
> > WITH (
>
> >   OIDS=FALSE
>
> > );
>
> >
>
> > CREATE INDEX extra_info_measurement_id_index
>
> >   ON "Extra_info"
>
> >   USING btree
>
> >   (measurement_id);
>
> >
>
> > CREATE TABLE "Feature"
>
> > (
>
> >   id bigserial NOT NULL,
>
> >   measurement_id bigserial NOT NULL,
>
> >   name text,
>
> >   description text,
>
> >   info text,
>
> >   CONSTRAINT "Feature_pkey" PRIMARY KEY (id),
>
> >   CONSTRAINT "Feature_measurement_id_fkey" FOREIGN KEY (measurement_id)
>
> >   REFERENCES "Measurement" (id) MATCH SIMPLE
>
> >   ON UPDATE NO ACTION ON DELETE NO ACTION
>
> > )
>
> > WITH (
>
> >   OIDS=FALSE
>
> > );
>
> >
>
> > CREATE INDEX feature_measurement_id_and_name_index
>
> >   ON "Feature"
>
> >   USING btree
>
> >   (measurement_id, name COL

Re: [PERFORM] Slow query with big tables

2016-08-26 Thread Jim Nasby

On 8/26/16 3:26 PM, Mike Sofen wrote:

Is there way to keep query time constant as the database size grows.


No. More data == more time. Unless you find a way to break the laws of 
physics.



Should I use partitioning or partial indexes?


Neither technique is a magic bullet. I doubt either would help here.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with big tables

2016-08-26 Thread Mike Sofen
 

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Tommi K
Sent: Friday, August 26, 2016 7:25 AM
To: Craig James <cja...@emolecules.com>
Cc: andreas kretschmer <akretsch...@spamfence.net>; 
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query with big tables

 

Ok, sorry that I did not add the original message. I thought that it would be 
automatically added to the message thread.

 

Here is the question again:

 

Is there way to keep query time constant as the database size grows. Should I 
use partitioning or partial indexes?

 

Thanks,

Tommi Kaksonen

 

 

 

> Hello, 

> 

> I have the following tables and query. I would like to get some help to find 
> out why it is slow and how its performance could be improved.

> 

> Thanks,

> Tommi K.

> 

> 

> --Table definitions---

> CREATE TABLE "Measurement"

> (

>   id bigserial NOT NULL,

>   product_id bigserial NOT NULL,

>   nominal_data_id bigserial NOT NULL,

>   description text,

>   serial text,

>   measurement_time timestamp without time zone,

>   status smallint,

>   system_description text,

>   CONSTRAINT "Measurement_pkey" PRIMARY KEY (id),

>   CONSTRAINT "Measurement_nominal_data_id_fkey" FOREIGN KEY (nominal_data_id)

>   REFERENCES "Nominal_data" (id) MATCH SIMPLE

>   ON UPDATE NO ACTION ON DELETE NO ACTION,

>   CONSTRAINT "Measurement_product_id_fkey" FOREIGN KEY (product_id)

>   REFERENCES "Product" (id) MATCH SIMPLE

>   ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

>   OIDS=FALSE

> );

> 

> CREATE INDEX measurement_time_index

>   ON "Measurement"

>   USING btree

>   (measurement_time);

> ALTER TABLE "Measurement" CLUSTER ON measurement_time_index;

> 

> CREATE TABLE "Product"

> (

>   id bigserial NOT NULL,

>   name text,

>   description text,

>   info text,

>   system_name text,

>   CONSTRAINT "Product_pkey" PRIMARY KEY (id)

> )

> WITH (

>   OIDS=FALSE

> );

> 

> 

> CREATE TABLE "Extra_info"

> (

>   id bigserial NOT NULL,

>   measurement_id bigserial NOT NULL,

>   name text,

>   description text,

>   info text,

>   type text,

>   value_string text,

>   value_double double precision,

>   value_integer bigint,

>   value_bool boolean,

>   CONSTRAINT "Extra_info_pkey" PRIMARY KEY (id),

>   CONSTRAINT "Extra_info_measurement_id_fkey" FOREIGN KEY (measurement_id)

>   REFERENCES "Measurement" (id) MATCH SIMPLE

>   ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

>   OIDS=FALSE

> );

> 

> CREATE INDEX extra_info_measurement_id_index

>   ON "Extra_info"

>   USING btree

>   (measurement_id);

> 

> CREATE TABLE "Feature"

> (

>   id bigserial NOT NULL,

>   measurement_id bigserial NOT NULL,

>   name text,

>   description text,

>   info text,

>   CONSTRAINT "Feature_pkey" PRIMARY KEY (id),

>   CONSTRAINT "Feature_measurement_id_fkey" FOREIGN KEY (measurement_id)

>   REFERENCES "Measurement" (id) MATCH SIMPLE

>   ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

>   OIDS=FALSE

> );

> 

> CREATE INDEX feature_measurement_id_and_name_index

>   ON "Feature"

>   USING btree

>   (measurement_id, name COLLATE pg_catalog."default");

> 

> CREATE INDEX feature_measurement_id_index

>   ON "Feature"

>   USING hash

>   (measurement_id);

> 

> 

> CREATE TABLE "Point"

> (

>   id bigserial NOT NULL,

>   feature_id bigserial NOT NULL,

>   x double precision,

>   y double precision,

>   z double precision,

>   status_x smallint,

>   status_y smallint,

>   status_z smallint,

>   difference_x double precision,

>   difference_y double precision,

>   difference_z double precision,

>   CONSTRAINT "Point_pkey" PRIMARY KEY (id),

>   CONSTRAINT "Point_feature_id_fkey" FOREIGN KEY (feature_id)

>   REFERENCES "Feature" (id) MATCH SIMPLE

>   ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

>   OIDS=FALSE

> );

> 

> CREATE INDEX point_feature_id_index

>   ON "Point"

>   USING btree

>   (feature_id);

> 

> CREATE TABLE "Warning"

> (

>   id bigserial NOT NULL,

>   feature_id bigserial NOT NULL,

>   "number" smallint,

>   info text,

>   CONSTRAINT "Warning_pkey" PRIMARY KEY (id),

>   CONSTRAINT "Warning_feature

Re: [PERFORM] Slow query with big tables

2016-08-26 Thread Tommi K
Ok, sorry that I did not add the original message. I thought that it would
be automatically added to the message thread.

Here is the question again:

Is there way to keep query time constant as the database size grows. Should
I use partitioning or partial indexes?

Thanks,
Tommi Kaksonen



> Hello,
>
> I have the following tables and query. I would like to get some help to
find out why it is slow and how its performance could be improved.
>
> Thanks,
> Tommi K.
>
>
> --Table definitions---
> CREATE TABLE "Measurement"
> (
>   id bigserial NOT NULL,
>   product_id bigserial NOT NULL,
>   nominal_data_id bigserial NOT NULL,
>   description text,
>   serial text,
>   measurement_time timestamp without time zone,
>   status smallint,
>   system_description text,
>   CONSTRAINT "Measurement_pkey" PRIMARY KEY (id),
>   CONSTRAINT "Measurement_nominal_data_id_fkey" FOREIGN KEY
(nominal_data_id)
>   REFERENCES "Nominal_data" (id) MATCH SIMPLE
>   ON UPDATE NO ACTION ON DELETE NO ACTION,
>   CONSTRAINT "Measurement_product_id_fkey" FOREIGN KEY (product_id)
>   REFERENCES "Product" (id) MATCH SIMPLE
>   ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
>   OIDS=FALSE
> );
>
> CREATE INDEX measurement_time_index
>   ON "Measurement"
>   USING btree
>   (measurement_time);
> ALTER TABLE "Measurement" CLUSTER ON measurement_time_index;
>
> CREATE TABLE "Product"
> (
>   id bigserial NOT NULL,
>   name text,
>   description text,
>   info text,
>   system_name text,
>   CONSTRAINT "Product_pkey" PRIMARY KEY (id)
> )
> WITH (
>   OIDS=FALSE
> );
>
>
> CREATE TABLE "Extra_info"
> (
>   id bigserial NOT NULL,
>   measurement_id bigserial NOT NULL,
>   name text,
>   description text,
>   info text,
>   type text,
>   value_string text,
>   value_double double precision,
>   value_integer bigint,
>   value_bool boolean,
>   CONSTRAINT "Extra_info_pkey" PRIMARY KEY (id),
>   CONSTRAINT "Extra_info_measurement_id_fkey" FOREIGN KEY (measurement_id)
>   REFERENCES "Measurement" (id) MATCH SIMPLE
>   ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
>   OIDS=FALSE
> );
>
> CREATE INDEX extra_info_measurement_id_index
>   ON "Extra_info"
>   USING btree
>   (measurement_id);
>
> CREATE TABLE "Feature"
> (
>   id bigserial NOT NULL,
>   measurement_id bigserial NOT NULL,
>   name text,
>   description text,
>   info text,
>   CONSTRAINT "Feature_pkey" PRIMARY KEY (id),
>   CONSTRAINT "Feature_measurement_id_fkey" FOREIGN KEY (measurement_id)
>   REFERENCES "Measurement" (id) MATCH SIMPLE
>   ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
>   OIDS=FALSE
> );
>
> CREATE INDEX feature_measurement_id_and_name_index
>   ON "Feature"
>   USING btree
>   (measurement_id, name COLLATE pg_catalog."default");
>
> CREATE INDEX feature_measurement_id_index
>   ON "Feature"
>   USING hash
>   (measurement_id);
>
>
> CREATE TABLE "Point"
> (
>   id bigserial NOT NULL,
>   feature_id bigserial NOT NULL,
>   x double precision,
>   y double precision,
>   z double precision,
>   status_x smallint,
>   status_y smallint,
>   status_z smallint,
>   difference_x double precision,
>   difference_y double precision,
>   difference_z double precision,
>   CONSTRAINT "Point_pkey" PRIMARY KEY (id),
>   CONSTRAINT "Point_feature_id_fkey" FOREIGN KEY (feature_id)
>   REFERENCES "Feature" (id) MATCH SIMPLE
>   ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
>   OIDS=FALSE
> );
>
> CREATE INDEX point_feature_id_index
>   ON "Point"
>   USING btree
>   (feature_id);
>
> CREATE TABLE "Warning"
> (
>   id bigserial NOT NULL,
>   feature_id bigserial NOT NULL,
>   "number" smallint,
>   info text,
>   CONSTRAINT "Warning_pkey" PRIMARY KEY (id),
>   CONSTRAINT "Warning_feature_id_fkey" FOREIGN KEY (feature_id)
>   REFERENCES "Feature" (id) MATCH SIMPLE
>   ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
>   OIDS=FALSE
> );
>
> CREATE INDEX warning_feature_id_index
>   ON "Warning"
>   USING btree
>   (feature_id);
>
>
> ---Query---
> SELECT
> f.name,
> f.description,
> SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND
warning.id IS NULL THEN 1 ELSE 0 END) AS green_count,
> SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND
warning.id IS NOT NULL THEN 1 ELSE 0 END) AS green_warned_count,
> SUM(CASE WHEN NOT (p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0)
AND NOT (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id
IS NULL THEN 1 ELSE 0 END) AS yellow_count,
> SUM(CASE WHEN NOT (p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0)
AND NOT (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id
IS NOT NULL THEN 1 ELSE 0 END) AS yellow_warned_count,
> SUM(CASE WHEN (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND
warning.id IS NULL THEN 1 ELSE 0 END) AS red_count,
> SUM(CASE WHEN (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND
warning.id IS NOT NULL THEN 1 ELSE 0 END) AS red_warned_count,
> SUM(CASE WHEN 

Re: [PERFORM] Slow query with big tables

2016-08-26 Thread Craig James
On Fri, Aug 26, 2016 at 6:17 AM, Tommi K  wrote:

> Hello,
> thanks for the response. I did not get the response to my email even
> though I am subscribed to the pgsql-performance mail list. Let's hope that
> I get the next one :)
>

Please include the email you are replying to when you respond. It saves
everyone time if they don't have to dig up your old emails, and many of us
discard old emails anyway and have no idea what you wrote before.

Craig


> Increasing work_mem did not have great impact on the performance. But I
> will try to update the PostgreSQL version to see if it speeds up things.
>
> However is there way to keep query time constant as the database size
> grows. Should I use partitioning or partial indexes?
>
> Best Regards,
> Tommi Kaksonen
>



-- 
-
Craig A. James
Chief Technology Officer
eMolecules, Inc.
-


Re: [PERFORM] Slow query with big tables

2016-08-26 Thread Tommi K
Hello,
thanks for the response. I did not get the response to my email even though
I am subscribed to the pgsql-performance mail list. Let's hope that I get
the next one :)

Increasing work_mem did not have great impact on the performance. But I
will try to update the PostgreSQL version to see if it speeds up things.

However is there way to keep query time constant as the database size
grows. Should I use partitioning or partial indexes?

Best Regards,
Tommi Kaksonen


Re: [PERFORM] Slow query with big tables

2016-08-25 Thread Andreas Kretschmer
Tommi Kaksonen  wrote:

> ---Version---
> PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit

current point release for 9.2 is 9.2.18, you are some years behind.

The plan seems okay for me, apart from the on-disk sort: increase
work_mem to avoid that.

If i where you i would switch to PG 9.5 - or wait for 9.6 and parallel
execution of aggregates.



Regards, Andreas Kretschmer
-- 
Andreas Kretschmer
http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query with big tables

2016-08-24 Thread Tommi Kaksonen
Hello,

I have the following tables and query. I would like to get some help to
find out why it is slow and how its performance could be improved.

Thanks,
Tommi K.


*--Table definitions---*
CREATE TABLE "Measurement"
(
  id bigserial NOT NULL,
  product_id bigserial NOT NULL,
  nominal_data_id bigserial NOT NULL,
  description text,
  serial text,
  measurement_time timestamp without time zone,
  status smallint,
  system_description text,
  CONSTRAINT "Measurement_pkey" PRIMARY KEY (id),
  CONSTRAINT "Measurement_nominal_data_id_fkey" FOREIGN KEY
(nominal_data_id)
  REFERENCES "Nominal_data" (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT "Measurement_product_id_fkey" FOREIGN KEY (product_id)
  REFERENCES "Product" (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);

CREATE INDEX measurement_time_index
  ON "Measurement"
  USING btree
  (measurement_time);
ALTER TABLE "Measurement" CLUSTER ON measurement_time_index;

CREATE TABLE "Product"
(
  id bigserial NOT NULL,
  name text,
  description text,
  info text,
  system_name text,
  CONSTRAINT "Product_pkey" PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);


CREATE TABLE "Extra_info"
(
  id bigserial NOT NULL,
  measurement_id bigserial NOT NULL,
  name text,
  description text,
  info text,
  type text,
  value_string text,
  value_double double precision,
  value_integer bigint,
  value_bool boolean,
  CONSTRAINT "Extra_info_pkey" PRIMARY KEY (id),
  CONSTRAINT "Extra_info_measurement_id_fkey" FOREIGN KEY (measurement_id)
  REFERENCES "Measurement" (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);

CREATE INDEX extra_info_measurement_id_index
  ON "Extra_info"
  USING btree
  (measurement_id);

CREATE TABLE "Feature"
(
  id bigserial NOT NULL,
  measurement_id bigserial NOT NULL,
  name text,
  description text,
  info text,
  CONSTRAINT "Feature_pkey" PRIMARY KEY (id),
  CONSTRAINT "Feature_measurement_id_fkey" FOREIGN KEY (measurement_id)
  REFERENCES "Measurement" (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);

CREATE INDEX feature_measurement_id_and_name_index
  ON "Feature"
  USING btree
  (measurement_id, name COLLATE pg_catalog."default");

CREATE INDEX feature_measurement_id_index
  ON "Feature"
  USING hash
  (measurement_id);


CREATE TABLE "Point"
(
  id bigserial NOT NULL,
  feature_id bigserial NOT NULL,
  x double precision,
  y double precision,
  z double precision,
  status_x smallint,
  status_y smallint,
  status_z smallint,
  difference_x double precision,
  difference_y double precision,
  difference_z double precision,
  CONSTRAINT "Point_pkey" PRIMARY KEY (id),
  CONSTRAINT "Point_feature_id_fkey" FOREIGN KEY (feature_id)
  REFERENCES "Feature" (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);

CREATE INDEX point_feature_id_index
  ON "Point"
  USING btree
  (feature_id);

CREATE TABLE "Warning"
(
  id bigserial NOT NULL,
  feature_id bigserial NOT NULL,
  "number" smallint,
  info text,
  CONSTRAINT "Warning_pkey" PRIMARY KEY (id),
  CONSTRAINT "Warning_feature_id_fkey" FOREIGN KEY (feature_id)
  REFERENCES "Feature" (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);

CREATE INDEX warning_feature_id_index
  ON "Warning"
  USING btree
  (feature_id);


*---Query---*
SELECT
f.name,
f.description,
SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND
warning.id IS NULL THEN 1 ELSE 0 END) AS green_count,
SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND
warning.id IS NOT NULL THEN 1 ELSE 0 END) AS green_warned_count,
SUM(CASE WHEN NOT (p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0)
AND NOT (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id
IS NULL THEN 1 ELSE 0 END) AS yellow_count,
SUM(CASE WHEN NOT (p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0)
AND NOT (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND warning.id
IS NOT NULL THEN 1 ELSE 0 END) AS yellow_warned_count,
SUM(CASE WHEN (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND
warning.id IS NULL THEN 1 ELSE 0 END) AS red_count,
SUM(CASE WHEN (p.status_x = 2 OR p.status_y = 2 OR p.status_z = 2) AND
warning.id IS NOT NULL THEN 1 ELSE 0 END) AS red_warned_count,
SUM(CASE WHEN (p.status_x = 1000 OR p.status_y = 1000 OR p.status_z = 1000)
AND warning.id IS NOT NULL THEN 1 ELSE 0 END) AS unable_to_measure_count
FROM "Point" p
JOIN "Feature" f ON f.id = p.feature_id
JOIN "Measurement" measurement ON measurement.id = f.measurement_id
JOIN "Product" product ON product.id = measurement.product_id
LEFT JOIN "Warning" warning ON f.id = warning.feature_id
WHERE (product.name ILIKE 'Part 1') AND
measurement.measurement_start_time >= '2015-06-18 17:00:00' AND
measurement.measurement_start_time <= '2015-06-18 18:00:00' AND
measurement.id NOT IN(SELECT measurement_id FROM 

Re: [PERFORM] Slow query help

2016-01-07 Thread Marc Mamin


>I ask your help to solve a slow query which is taking more than 14 seconds to 
>be executed.
>Maybe I am asking too much both from you and specially from postgresql, as it 
>is really huge, envolving 16 tables.
>
>Explain:
>http://explain.depesz.com/s/XII9
>
>Schema:
>http://adj.com.br/erp/data_schema/

Hello,

It seems that you don't pay much attention to column alignment.
e.g. http://adj.com.br/erp/data_schema/tables/ERP_PUBLIC_sys_person.html

This probably won't make any significant difference in your case,
but this is something to be aware of when dealing with large tables.
here is a good starting link for this topic:
http://stackoverflow.com/questions/12604744/does-the-order-of-columns-in-a-postgres-table-impact-performance

regards,

Marc Mamin


[PERFORM] Slow query help

2016-01-06 Thread Almir de Oliveira Duarte Junior

Hi,

I ask your help to solve a slow query which is taking more than 14 
seconds to be executed.
Maybe I am asking too much both from you and specially from postgresql, 
as it is really huge, envolving 16 tables.


Explain:
http://explain.depesz.com/s/XII9

Schema:
http://adj.com.br/erp/data_schema/

Version:
PostgreSQL 9.2.14 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 
4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit


OS: Centos 7.1
*Linux centos01.insoliti.com.br 3.10.0-327.3.1.el7.x86_64 #1 SMP Wed Dec 
9 14:09:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

*

 * contains large objects: no
 * has a large proportion of NULLs in several columns: maybe
 * receives a large number of UPDATEs or DELETEs regularly: no
 * is growing rapidly: no
 * has many indexes on it: maybe (please see schema)
 * uses triggers that may be executing database functions, or is
   calling functions directly: in some cases


 * *History:*the system is still being developed.
 * *Hardware*: this is the development environment, a Dell T110-II
   server, with 8GB of ram and cpu as follows

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 58
model name  : Intel(R) Pentium(R) CPU G2120 @ 3.10GHz
stepping: 9
microcode   : 0x1b
cpu MHz : 1663.101
cache size  : 3072 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor 
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt 
tsc_deadline_timer xsave lahf_lm arat epb pln pts dtherm tpr_shadow vnmi 
flexpriority ept vpid fsgsbase smep erms xsaveopt

bogomips: 6185.92
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 58
model name  : Intel(R) Pentium(R) CPU G2120 @ 3.10GHz
stepping: 9
microcode   : 0x1b
cpu MHz : 1647.722
cache size  : 3072 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
apicid  : 2
initial apicid  : 2
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor 
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt 
tsc_deadline_timer xsave lahf_lm arat epb pln pts dtherm tpr_shadow vnmi 
flexpriority ept vpid fsgsbase smep erms xsaveopt

bogomips: 6185.92
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

Configuration:
  name   | current_setting  |
source

-+---+--
 application_name| psql  | 
client
 authentication_timeout  | 1min  | 
configuration file
 autovacuum  | on| 
configuration file
 autovacuum_analyze_scale_factor | 0.05  | 
configuration file
 autovacuum_analyze_threshold| 10| 
configuration file
 autovacuum_freeze_max_age   | 2 | 
configuration file
 autovacuum_max_workers  | 6 | 
configuration file
 autovacuum_naptime  | 15s   | 
configuration file
 autovacuum_vacuum_cost_delay| 10ms  | 
configuration file
 autovacuum_vacuum_cost_limit| 1000  | 
configuration file
 autovacuum_vacuum_scale_factor  | 0.1   | 
configuration file
 autovacuum_vacuum_threshold | 25| 
configuration file
 bytea_output| hex   | 
configuration file
 checkpoint_completion_target| 0.9   | 
configuration file
 checkpoint_segments | 32| 
configuration file
 checkpoint_timeout  | 10min | 
configuration file
 client_encoding | UTF8  | 
client
 client_min_messages | log   | 

Re: [PERFORM] Slow query in trigger function

2015-11-03 Thread Guido Niewerth
These are the queries I used to get the execution planer use the index scan 
instead of the sequential scan:

IF NOT EXISTS (SELECT 1 FROM custom_data WHERE key = old.key) => sequential scan
IF NOT EXISTS (SELECT 1 FROM custom_data WHERE key = old.key LIMIT 1) => 
sequential scan
IF NOT EXISTS (SELECT max( 1 ) FROM custom_data WHERE key = old.key) => 
sequential scan

After breaking up the code into two statements the execution planer uses the 
index scan:

result INTEGER;
SELECT 1 FROM custom_data where key = old.key INTO result;
IF result ISNULL THEN
   ...
END IF;

To me it looks like the execution planer does not choose the optimal strategy. 
Even small changes in the function body make the execution planer use the slow 
sequential scan.

Guido Niewerth

25 years inspired synergy
OCS Optical Control Systems GmbH
Wullener Feld 24
58454 Witten
Germany

Tel: +49 (0) 2302 95622-0
Fax: +49 (0) 2302 95622-33
Email: gniewe...@ocsgmbh.com
Web: http://www.ocsgmbh.com

HRB 8442 (Bochum) | VAT-ID: DE 124 084 990
Directors: Hans Gloeckler, Fatah Najaf, Merdan Sariboga




Re: [PERFORM] Slow query in trigger function

2015-11-02 Thread Tom Lane
Guido Niewerth  writes:
> As you can see there´s a huge runtime difference between the select query 
> used in the trigger function and the one run from the SQL editor.

contrib/auto_explain might be useful in seeing what's going on, in
particular it would tell us whether or not a different plan is being
used for the query inside the function.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query in trigger function

2015-11-02 Thread Guido Niewerth
I needed to set up the trigger function again, so here it is:

CREATE OR REPLACE FUNCTION public.fn_trigger_test ()
RETURNS trigger AS
$body$
DECLARE
start TIMESTAMP;
BEGIN
   start := timeofday();
   IF TG_OP = 'UPDATE' THEN
  IF NOT EXISTS( SELECT key FROM custom_data WHERE key = old.key LIMIT 1 ) 
THEN
DELETE FROM lookup_custom_data_keys WHERE key = old.key;
  END IF;
  IF NOT EXISTS( SELECT 1 FROM lookup_custom_data_keys WHERE key = new.key 
LIMIT 1 ) THEN
INSERT INTO lookup_custom_data_keys (key) VALUES (new.key);
  END IF;
   END IF;
   RAISE NOTICE 'Trigger % ran: %', TG_OP, age( timeofday() ::TIMESTAMP, start 
);
   RETURN NULL;
END;
$body$
LANGUAGE 'plpgsql'
VOLATILE
CALLED ON NULL INPUT
SECURITY INVOKER
COST 100;

And this is the execution plan. It looks like it does a slow sequential scan 
where it´s able to do an index scan:

2015-11-02 17:42:10 CET LOG:  duration: 5195.673 ms  plan:
Query Text: SELECT NOT EXISTS( SELECT 1 FROM custom_data WHERE 
key = old.key LIMIT 1 )
Result  (cost=0.09..0.10 rows=1 width=0) (actual 
time=5195.667..5195.667 rows=1 loops=1)
  Output: (NOT $0)
  Buffers: shared hit=34 read=351750
  InitPlan 1 (returns $0)
->  Limit  (cost=0.00..0.09 rows=1 width=0) (actual 
time=5195.662..5195.662 rows=0 loops=1)
  Output: (1)
  Buffers: shared hit=34 read=351750
  ->  Seq Scan on public.custom_data  
(cost=0.00..821325.76 rows=9390835 width=0) (actual time=5195.658..5195.658 
rows=0 loops=1)
Output: 1
Filter: (custom_data.key = $15)
Buffers: shared hit=34 read=351750
2015-11-02 17:42:10 CET ZUSAMMENHANG:  SQL-Anweisung »SELECT NOT EXISTS( SELECT 
1 FROM custom_data WHERE key = old.key LIMIT 1 )«
PL/pgSQL function "fn_trigger_test" line 7 at IF
2015-11-02 17:42:10 CET LOG:  duration: 0.014 ms  plan:
Query Text: DELETE FROM lookup_custom_data_keys WHERE key = 
old.key
Delete on public.lookup_custom_data_keys  (cost=0.00..38.25 
rows=1 width=6) (actual time=0.013..0.013 rows=0 loops=1)
  Buffers: shared hit=2
  ->  Seq Scan on public.lookup_custom_data_keys  
(cost=0.00..38.25 rows=1 width=6) (actual time=0.007..0.007 rows=1 loops=1)
Output: ctid
Filter: (lookup_custom_data_keys.key = $15)
Buffers: shared hit=1
2015-11-02 17:42:10 CET ZUSAMMENHANG:  SQL-Anweisung »DELETE FROM 
lookup_custom_data_keys WHERE key = old.key«
PL/pgSQL function "fn_trigger_test" line 8 at SQL-Anweisung
2015-11-02 17:42:10 CET LOG:  duration: 0.005 ms  plan:
Query Text: SELECT NOT EXISTS( SELECT 1 FROM 
lookup_custom_data_keys WHERE key = new.key LIMIT 1 )
Result  (cost=38.25..38.26 rows=1 width=0) (actual 
time=0.004..0.004 rows=1 loops=1)
  Output: (NOT $0)
  Buffers: shared hit=1
  InitPlan 1 (returns $0)
->  Limit  (cost=0.00..38.25 rows=1 width=0) (actual 
time=0.003..0.003 rows=0 loops=1)
  Output: (1)
  Buffers: shared hit=1
  ->  Seq Scan on public.lookup_custom_data_keys  
(cost=0.00..38.25 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1)
Output: 1
Filter: (lookup_custom_data_keys.key = $17)
Buffers: shared hit=1
2015-11-02 17:42:10 CET ZUSAMMENHANG:  SQL-Anweisung »SELECT NOT EXISTS( SELECT 
1 FROM lookup_custom_data_keys WHERE key = new.key LIMIT 1 )«
PL/pgSQL function "fn_trigger_test" line 10 at IF
2015-11-02 17:42:10 CET LOG:  duration: 0.116 ms  plan:
Query Text: INSERT INTO lookup_custom_data_keys (key) VALUES 
(new.key)
Insert on public.lookup_custom_data_keys  (cost=0.00..0.01 
rows=1 width=0) (actual time=0.115..0.115 rows=0 loops=1)
  Buffers: shared hit=1
  ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual 
time=0.001..0.001 rows=1 loops=1)
Output: $17
2015-11-02 17:42:10 CET ZUSAMMENHANG:  SQL-Anweisung »INSERT INTO 
lookup_custom_data_keys (key) VALUES (new.key)«
PL/pgSQL function "fn_trigger_test" line 11 at SQL-Anweisung
2015-11-02 17:42:10 CET LOG:  duration: 5200.475 ms  plan:
Query Text: UPDATE custom_data SET key= 'key-2' WHERE key = 
'key-1'
Update on public.custom_data  (cost=0.00..15.35 rows=1 
width=34) (actual time=0.369..0.369 rows=0 loops=1)
  Buffers: shared hit=29
  ->  Index Scan using idx_custom_data_key on 

Re: [PERFORM] Slow query in trigger function

2015-11-02 Thread Tom Lane
Guido Niewerth  writes:
> And this is the execution plan. It looks like it does a slow sequential scan 
> where it´s able to do an index scan:

> 2015-11-02 17:42:10 CET LOG:  duration: 5195.673 ms  plan:
> Query Text: SELECT NOT EXISTS( SELECT 1 FROM custom_data 
> WHERE key = old.key LIMIT 1 )
> Result  (cost=0.09..0.10 rows=1 width=0) (actual 
> time=5195.667..5195.667 rows=1 loops=1)
>   Output: (NOT $0)
>   Buffers: shared hit=34 read=351750
>   InitPlan 1 (returns $0)
> ->  Limit  (cost=0.00..0.09 rows=1 width=0) (actual 
> time=5195.662..5195.662 rows=0 loops=1)
>   Output: (1)
>   Buffers: shared hit=34 read=351750
>   ->  Seq Scan on public.custom_data  
> (cost=0.00..821325.76 rows=9390835 width=0) (actual time=5195.658..5195.658 
> rows=0 loops=1)
> Output: 1
> Filter: (custom_data.key = $15)
> Buffers: shared hit=34 read=351750

It looks like you're getting bit by an inaccurate estimate of what will be
the quickest way to satisfy a LIMIT query.  In this particular situation,
I'd advise just dropping the LIMIT, as it contributes nothing useful.

(If memory serves, 9.5 will actually ignore constant-LIMIT clauses inside
EXISTS(), because people keep writing them even though they're useless.
Earlier releases do not have that code though.)

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query in trigger function

2015-11-02 Thread Guido Niewerth
Hello,

I´ve got a table custom_data which essentially contains a number of key/value 
pairs. This table holds a large number (about 40M) of records and I need the 
distinct keys and values for some reasons. Selecting those distinct data takes 
a couple of seconds, so I decided to maintain a separate lookup table for both 
the key and value data. The lookup tables are maintained by a trigger that 
reacts on inserts/updates/deletes on the original table. While checking the 
correctness of my trigger function I noticed that the SQL query in the trigger 
function is surprisingly slow, taking about 5-6 seconds. When I ran the SQL 
query outside the trigger function it showed the expected performance and 
returned in a couple of milliseconds. Though the original table is very large 
it holds only a small number of distinct key / value values:

SELECT DISTINCT key FROM custom_data;
>> 12 rows returned

SELECT DISTINCT value FROM custom_data;
>> 13 rows returned


Here are the relveant information (function body of the trigger function 
reduced to show the behaviour):

PostgreSQL Version:
PostgreSQL 9.1.13, compiled by Visual C++ build 1500, 64-bit

OS Version:
Windows 7 64bit

Scenario to reproduce the behaviour:
EMS Solution SQL Manager: SQL Editor used to run SQL commands from an editor

Server configuration:
name   current_setting  
 source
DateStyle   ISO, DMY
   session
default_text_search_config  pg_catalog.german
configuration file
effective_cache_size8GB 
 configuration file
lc_messages German_Germany.1252 
 configuration file
lc_monetary German_Germany.1252 
 configuration file
lc_numericGerman_Germany.1252   
   configuration file
lc_timeGerman_Germany.1252  
configuration file
listen_addresses *  
  configuration file
log_destination   stderr
  configuration file
log_line_prefix%t   
  configuration file
log_timezone  CET   
environment variable
logging_collector on
 configuration file
max_connections   100   
 configuration file
max_stack_depth  2MB
 environment variable
port  5432  
   configuration file
shared_buffers   4GB
  configuration file
statement_timeout   0   
 session
TimeZone  CET   
environment variable
work_mem   64MB 
  configuration file

custom_data table definition:
CREATE TABLE public.custom_data (
  custom_data_id SERIAL,
  file_id INTEGER DEFAULT 0 NOT NULL,
  user_id INTEGER DEFAULT 0 NOT NULL,
  "timestamp" TIMESTAMP(0) WITHOUT TIME ZONE DEFAULT '1970-01-01 
00:00:00'::timestamp without time zone NOT NULL,
  key TEXT DEFAULT ''::text NOT NULL,
  value TEXT DEFAULT ''::text NOT NULL,
  CONSTRAINT pkey_custom_data PRIMARY KEY(custom_data_id),
) WITHOUT OIDS;

CREATE INDEX idx_custom_data_key ON public.custom_data USING btree (key);

CREATE INDEX idx_custom_data_value ON public.custom_data  USING btree (value);

CREATE TRIGGER on_custom_data_changed AFTER INSERT OR UPDATE OR DELETE
ON public.custom_data FOR EACH ROW
EXECUTE PROCEDURE public.on_change_custom_data();

CREATE OR REPLACE FUNCTION public.on_change_custom_data ()
RETURNS trigger AS
$body$
BEGIN
   IF TG_OP = 'UPDATE' THEN
  RAISE NOTICE 'Check custom data key start  : %', timeofday();
  IF NOT EXISTS (SELECT 1 FROM custom_data WHERE key = OLD.key ) THEN
  END IF;
  RAISE NOTICE 'Check custom data key end: %', timeofday();
END IF;
RETURN NULL;
END;
$body$
LANGUAGE 'plpgsql'
VOLATILE CALLED ON NULL INPUT SECURITY INVOKER COST 100;

postgreSQL log:
HINWEIS:  Check custom data key start  : Fri Oct 30 11:56:41.785000 2015 CET << 
start of IF NOT EXIST (...)
HINWEIS:  

Re: [PERFORM] Slow Query

2015-08-14 Thread 林士博
OK.

If you benchmark that correctly,
then it seems that adding some redundant search can get the query faster in
some special cases.

And without further info, I can not see any reason.

2015-08-14 14:35 GMT+09:00 Robert Campbell robcampbel...@gmail.com:

 Hi,

 My mistake, didnt apply the sub query properly the first time.

 It does return records but not quite as fast as original query, about
 200ms slower

 Vacancy ID is a primary key.

 On Fri, Aug 14, 2015 at 3:10 PM, 林士博 l...@repica.co.jp wrote:

 Is the Vacancy.ID a primary key?
 Or is unique in Vacancy table?




 --
 Regards

 Robert Campbell
 +61412062971
 robcampbel...@gmail.com



Re: [PERFORM] Slow Query

2015-08-13 Thread 林士博
In the 'not exists' cluster, you do not have to search table Vacancy as v
again.
I think it would be faster to use the outer Vacancy table as below.
In your case, that do the same work.

NOT EXISTS (
SELECT 1
FROM CategoryOption_TableRow ct126
WHERE Vacancy.Template
  AND ct126.CategoryOptionID IN (34024,35254,35255,35256)
  AND ct126.Category_TableID = 126
  AND ct126.RowID = Vacancy.ID
  )


Re: [PERFORM] Slow Query

2015-08-13 Thread robbyc
Hi,

Doing this returns 0 records

On Thu, Aug 13, 2015 at 7:22 PM, 林士博 [via PostgreSQL] 
ml-node+s1045698n5862008...@n5.nabble.com wrote:

 In the 'not exists' cluster, you do not have to search table Vacancy as
 v again.
 I think it would be faster to use the outer Vacancy table as below.
 In your case, that do the same work.

 NOT EXISTS (
 SELECT 1
 FROM CategoryOption_TableRow ct126
 WHERE Vacancy.Template
   AND ct126.CategoryOptionID IN (34024,35254,35255,35256)
   AND ct126.Category_TableID = 126
   AND ct126.RowID = Vacancy.ID
   )


 --
 If you reply to this email, your message will be added to the discussion
 below:
 http://postgresql.nabble.com/Slow-Query-tp5861835p5862008.html
 To unsubscribe from Slow Query, click here
 http://postgresql.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=5861835code=cm9iY2FtcGJlbGw3M0BnbWFpbC5jb218NTg2MTgzNXwxOTc1MDc2ODM4
 .
 NAML
 http://postgresql.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Regards

Robert Campbell
+61412062971
robcampbel...@gmail.com




--
View this message in context: 
http://postgresql.nabble.com/Slow-Query-tp5861835p5862122.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: [PERFORM] Slow Query

2015-08-13 Thread 林士博
Is the Vacancy.ID a primary key?
Or is unique in Vacancy table?


Re: [PERFORM] Slow Query

2015-08-12 Thread robbyc
Hi Vik,

Thanks for your feedback, very helpful.

I modified your query slightly, this will return all vacancy templates and
all level 1 vacancies which arent templates, and does so in about
~800-900ms less, an great improvement on the original query.

SELECT Vacancy.ID,
   Vacancy.JobTitle,
   Vacancy.DateCreated,
   Vacancy.CustomAccess,
   Department.Name as Department,
   list(Occupation.Name) as Occupation,
   Vacancy.PositionNo,
   Vacancy.Template
FROM Vacancy
JOIN CategoryOption_TableRow as c_50 ON (
c_50.Category_TableID= 50
AND c_50.RowID = Vacancy.ID
AND c_50.CategoryOptionID=19205)
LEFT JOIN CategoryOption_TableRow as c_22 ON (
c_22.RowID = Vacancy.ID
AND c_22.Category_TableID = 22)
LEFT JOIN CategoryOption as Occupation ON (
Occupation.ID = c_22.CategoryOptionID)
LEFT JOIN TableRow_TableRow as t_33 ON (
t_33.Table1RowID = Vacancy.ID
AND t_33.Table_TableID = 33)
LEFT JOIN Department ON (
Department.ID = t_33.Table2RowID
AND Department.Active = 't'
AND Department.ClientID = 263)
WHERE Vacancy.ClientID = 263
  AND NOT EXISTS (
SELECT 1
FROM Vacancy as v
JOIN CategoryOption_TableRow ct126 on (
ct126.Category_TableID = 126
AND RowID = v.ID)
WHERE v.Template
  AND ct126.CategoryOptionID IN (34024,35254,35255,35256)
  AND v.ID = Vacancy.ID)
  AND (Vacancy.Template OR (Vacancy.Template = 'f' AND
Vacancy.Level = 1))
  GROUP BY Vacancy.ID, Vacancy.JobTitle, Vacancy.DateCreated,
 Vacancy.CustomAccess, Department.Name,
 Vacancy.PositionNo, Vacancy.Template








On Wed, Aug 12, 2015 at 9:35 PM, Vik Fearing-3 [via PostgreSQL] 
ml-node+s1045698n5861873...@n5.nabble.com wrote:

 On 08/12/2015 04:34 AM, robbyc wrote:
  Hi,
 
  I am new to optimizing queries and i'm getting a slow running time
  (~1.5secs) with the following SQL:

 Before mucking about with work_mem and indexes, the first thing to do is
 rewrite this query correctly.  Here are just some of the things wrong
 with the query as written:

 * You're doing a DISTINCT on the same set of columns also in a GROUP BY.
   This is redundant and causes needless deduplication.

 * You're joining two GROUPed BY then DISTINCTed queries using the UNION
   operator which will do yet another pass for deduplication.

 * You've got the entire query repeated for just a simple difference in
   the global WHERE clause.  These can be merged.

 * You've kept LEFT JOINs in the subquery but you don't use any values
   from them.  These can be safely removed altogether.

 * You're using a NOT IN clause which is almost never what you want.  Use
   NOT EXISTS instead.

 What is this list() function?  How is it defined?  Can it be replaced
 with string_agg()?

 You're not doing yourself any favors at all with all this quoting and
 mixed case stuff.

 Here is a rewritten version, please let me know how it performs:

 SELECT Vacancy.ID,
Vacancy.JobTitle,
Vacancy.DateCreated,
Vacancy.CustomAccess,
Department.Name as Department,
list(Occupation.Name) as Occupation,
Vacancy.PositionNo,
Vacancy.Template
 FROM Vacancy
 JOIN CategoryOption_TableRow as c_50 ON (
 c_50.Category_TableID= 50
 AND c_50.RowID = Vacancy.ID
 AND c_50.CategoryOptionID=19205)
 LEFT JOIN CategoryOption_TableRow as c_22 ON (
 c_22.RowID = Vacancy.ID
 AND c_22.Category_TableID = 22)
 LEFT JOIN CategoryOption as Occupation ON (
 Occupation.ID = c_22.CategoryOptionID)
 LEFT JOIN TableRow_TableRow as t_33 ON (
 t_33.Table1RowID = Vacancy.ID
 AND t_33.Table_TableID = 33)
 LEFT JOIN Department ON (
 Department.ID = t_33.Table2RowID
 AND Department.Active = 't'
 AND Department.ClientID = 263)
 WHERE Vacancy.ClientID = 263
   AND NOT EXISTS (
 SELECT 1
 FROM Vacancy as _Vacancy
 JOIN CategoryOption_TableRow ct126 on (
 ct126.Category_TableID = 126
 AND RowID = _Vacancy.ID)
 WHERE _Vacancy.Template
   AND ct126.CategoryOptionID IN (34024,35254,35255,35256)
   AND _Vacancy.ID = Vacancy.ID)
   AND (Vacancy.Template = 't' OR Vacancy.Level = 1)
 GROUP BY Vacancy.ID, Vacancy.JobTitle, Vacancy.DateCreated,
  Vacancy.CustomAccess, Department.Name,
  Vacancy.PositionNo, Vacancy.Template


  SELECT distinct(Vacancy.ID), Vacancy.JobTitle,
  Vacancy.DateCreated, Vacancy.CustomAccess
  , Department.Name as Department, list(Occupation.Name) as
  Occupation, Vacancy.PositionNo
  , Vacancy.Template from
 Vacancy
  LEFT JOIN CategoryOption_TableRow as c_22
   ON (c_22.RowID = Vacancy.ID
   and c_22.Category_TableID = 22)
  LEFT JOIN CategoryOption as Occupation
   ON (Occupation.ID = c_22.CategoryOptionID)
  LEFT JOIN TableRow_TableRow as t_33
ON (t_33.Table1RowID = Vacancy.ID
and t_33.Table_TableID = 

Re: [PERFORM] Slow Query

2015-08-12 Thread Venkata Balaji N
On Wed, Aug 12, 2015 at 3:29 PM, robbyc robcampbel...@gmail.com wrote:

 Hi Venkata,

 work_mem was set to 72MB, increased to 144MB, no change.


Increasing work_mem depends on various other factors like Table size
(amount of data being sorted), available memory etc.


 Added an index of type varchar_pattern_ops to Vacancy.JobTitle, this did
 not help either.


Sorry, I did not mean to say that an Index must be added straight away. The
column must be eligible to have an Index. Meaning, Index will be
beneficial if created on a column with high number of distinct values.

If either of the above does not help, then options to rewrite the query
must be explored.

Thanks,
Venkata Balaji N

Fujitsu Australia



 On Wed, Aug 12, 2015 at 2:09 PM, Venkata Balaji N [via PostgreSQL] [hidden
 email] http:///user/SendEmail.jtp?type=nodenode=5861850i=0 wrote:

 On Wed, Aug 12, 2015 at 12:34 PM, robbyc [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=5861839i=0 wrote:

 Hi,

 I am new to optimizing queries and i'm getting a slow running time
 (~1.5secs) with the following SQL:

 SELECT distinct(Vacancy.ID), Vacancy.JobTitle,
 Vacancy.DateCreated, Vacancy.CustomAccess
 , Department.Name as Department, list(Occupation.Name) as
 Occupation, Vacancy.PositionNo
 , Vacancy.Template from
Vacancy
 LEFT JOIN CategoryOption_TableRow as c_22
  ON (c_22.RowID = Vacancy.ID
  and c_22.Category_TableID = 22)
 LEFT JOIN CategoryOption as Occupation
  ON (Occupation.ID = c_22.CategoryOptionID)
 LEFT JOIN TableRow_TableRow as t_33
   ON (t_33.Table1RowID = Vacancy.ID
   and t_33.Table_TableID = 33 )
 LEFT JOIN Department
  ON (Department.ID = t_33.Table2RowID and
 Department.Active = 't' and Department
 .ClientID = 263)
 JOIN CategoryOption_TableRow as c_50 ON (c_50.Category_TableID=
 50
 and c_50.RowID = Vacancy
 .ID and c_50.CategoryOptionID=19205)
 WHERE Vacancy.ClientID = 263 AND Vacancy.ID NOT IN(SELECT
 DISTINCT(Vacancy.ID)
 FROM Vacancy join CategoryOption_TableRow ct126 on
 (ct126.Category_TableID = 126
  and RowID = Vacancy.ID)
 left join Workflow on (Workflow.VacancyID =
 Vacancy.ID
 and Workflow.Level
 = 1)
 left join CategoryOption_TableRow c30 on (c30.RowID =
 Workflow.ID and c30.Category_TableID
  = 30 and c30.CategoryOptionID = 21923)
 WHERE Template AND ct126.CategoryOptionID
 IN(34024,35254,35255,35256)) and Vacancy
 .Template = 't'
 GROUP BY Vacancy.ID, Vacancy.JobTitle, Vacancy.DateCreated,
 Vacancy.CustomAccess, Department
 .Name, Vacancy.PositionNo, Vacancy.Template
 UNION
 SELECT distinct(Vacancy.ID), Vacancy.JobTitle,
 Vacancy.DateCreated, Vacancy.CustomAccess
 , Department.Name as Department, list(Occupation.Name) as
 Occupation, Vacancy.PositionNo
 , Vacancy.Template from
Vacancy
 LEFT JOIN CategoryOption_TableRow as c_22
  ON (c_22.RowID = Vacancy.ID
  and c_22.Category_TableID = 22)
 LEFT JOIN CategoryOption as Occupation
  ON (Occupation.ID = c_22.CategoryOptionID)
 LEFT JOIN TableRow_TableRow as t_33
   ON (t_33.Table1RowID = Vacancy.ID
   and t_33.Table_TableID = 33 )
 LEFT JOIN Department
  ON (Department.ID = t_33.Table2RowID and
 Department.Active = 't' and Department
 .ClientID = 263)
 JOIN CategoryOption_TableRow as c_50 ON (c_50.Category_TableID=
 50
 and c_50.RowID = Vacancy
 .ID and c_50.CategoryOptionID=19205)
 WHERE Vacancy.ClientID = 263 AND Vacancy.ID NOT IN(SELECT
 DISTINCT(Vacancy.ID)
 FROM Vacancy join CategoryOption_TableRow ct126 on
 (ct126.Category_TableID = 126
  and RowID = Vacancy.ID)
 left join Workflow on (Workflow.VacancyID =
 Vacancy.ID
 and Workflow.Level
 = 1)
 left join CategoryOption_TableRow c30 on (c30.RowID =
 Workflow.ID and c30.Category_TableID
  = 30 and c30.CategoryOptionID = 21923)
 WHERE Template AND ct126.CategoryOptionID
 IN(34024,35254,35255,35256))  and Vacancy
 .Template  't' AND Vacancy.Level = 1
 GROUP BY Vacancy.ID, Vacancy.JobTitle, Vacancy.DateCreated,
 Vacancy.CustomAccess, Department
 .Name, Vacancy.PositionNo, Vacancy.Template
  ORDER BY JobTitle

 Running explain analyze gives me the following information:
 http://explain.depesz.com/s/pdC http://explain.depesz.com/s/pdC


 For a total runtime: 2877.157 ms

 If i remove the left joins on Department and TableRow_TableRow this
 reduces
 the run time by about a third.
 Additionally removing CategoryOption and CategoryOption_TableRow joins
 further reduces by a about a third.

 Given that i need both these joins for the information retrieved by them,
 what would be the best way to re-factor this query so it runs faster?

 Looking at the output of explain analyze the hash aggregates and sort
 seem
 to be the primary issue.


 The query has got a distinct and group-by/order-by clauses which seems to
 be taking time. Without looking at much details of the query code and Table
 size etc, did you try increasing the work_mem and 

Re: [PERFORM] Slow Query

2015-08-12 Thread Vik Fearing
On 08/12/2015 04:34 AM, robbyc wrote:
 Hi,
 
 I am new to optimizing queries and i'm getting a slow running time
 (~1.5secs) with the following SQL:

Before mucking about with work_mem and indexes, the first thing to do is
rewrite this query correctly.  Here are just some of the things wrong
with the query as written:

* You're doing a DISTINCT on the same set of columns also in a GROUP BY.
  This is redundant and causes needless deduplication.

* You're joining two GROUPed BY then DISTINCTed queries using the UNION
  operator which will do yet another pass for deduplication.

* You've got the entire query repeated for just a simple difference in
  the global WHERE clause.  These can be merged.

* You've kept LEFT JOINs in the subquery but you don't use any values
  from them.  These can be safely removed altogether.

* You're using a NOT IN clause which is almost never what you want.  Use
  NOT EXISTS instead.

What is this list() function?  How is it defined?  Can it be replaced
with string_agg()?

You're not doing yourself any favors at all with all this quoting and
mixed case stuff.

Here is a rewritten version, please let me know how it performs:

SELECT Vacancy.ID,
   Vacancy.JobTitle,
   Vacancy.DateCreated,
   Vacancy.CustomAccess,
   Department.Name as Department,
   list(Occupation.Name) as Occupation,
   Vacancy.PositionNo,
   Vacancy.Template
FROM Vacancy
JOIN CategoryOption_TableRow as c_50 ON (
c_50.Category_TableID= 50
AND c_50.RowID = Vacancy.ID
AND c_50.CategoryOptionID=19205)
LEFT JOIN CategoryOption_TableRow as c_22 ON (
c_22.RowID = Vacancy.ID
AND c_22.Category_TableID = 22)
LEFT JOIN CategoryOption as Occupation ON (
Occupation.ID = c_22.CategoryOptionID)
LEFT JOIN TableRow_TableRow as t_33 ON (
t_33.Table1RowID = Vacancy.ID
AND t_33.Table_TableID = 33)
LEFT JOIN Department ON (
Department.ID = t_33.Table2RowID
AND Department.Active = 't'
AND Department.ClientID = 263)
WHERE Vacancy.ClientID = 263
  AND NOT EXISTS (
SELECT 1
FROM Vacancy as _Vacancy
JOIN CategoryOption_TableRow ct126 on (
ct126.Category_TableID = 126
AND RowID = _Vacancy.ID)
WHERE _Vacancy.Template
  AND ct126.CategoryOptionID IN (34024,35254,35255,35256)
  AND _Vacancy.ID = Vacancy.ID)
  AND (Vacancy.Template = 't' OR Vacancy.Level = 1)
GROUP BY Vacancy.ID, Vacancy.JobTitle, Vacancy.DateCreated,
 Vacancy.CustomAccess, Department.Name,
 Vacancy.PositionNo, Vacancy.Template


 SELECT distinct(Vacancy.ID), Vacancy.JobTitle,
 Vacancy.DateCreated, Vacancy.CustomAccess
 , Department.Name as Department, list(Occupation.Name) as
 Occupation, Vacancy.PositionNo
 , Vacancy.Template from 
Vacancy
 LEFT JOIN CategoryOption_TableRow as c_22 
  ON (c_22.RowID = Vacancy.ID 
  and c_22.Category_TableID = 22)
 LEFT JOIN CategoryOption as Occupation
  ON (Occupation.ID = c_22.CategoryOptionID)
 LEFT JOIN TableRow_TableRow as t_33 
   ON (t_33.Table1RowID = Vacancy.ID 
   and t_33.Table_TableID = 33 )
 LEFT JOIN Department
  ON (Department.ID = t_33.Table2RowID and
 Department.Active = 't' and Department
 .ClientID = 263)
 JOIN CategoryOption_TableRow as c_50 ON (c_50.Category_TableID= 50
 and c_50.RowID = Vacancy
 .ID and c_50.CategoryOptionID=19205)
 WHERE Vacancy.ClientID = 263 AND Vacancy.ID NOT IN(SELECT
 DISTINCT(Vacancy.ID) 
 FROM Vacancy join CategoryOption_TableRow ct126 on
 (ct126.Category_TableID = 126
  and RowID = Vacancy.ID) 
 left join Workflow on (Workflow.VacancyID = Vacancy.ID
 and Workflow.Level 
 = 1) 
 left join CategoryOption_TableRow c30 on (c30.RowID =
 Workflow.ID and c30.Category_TableID
  = 30 and c30.CategoryOptionID = 21923) 
 WHERE Template AND ct126.CategoryOptionID
 IN(34024,35254,35255,35256)) and Vacancy
 .Template = 't'
 GROUP BY Vacancy.ID, Vacancy.JobTitle, Vacancy.DateCreated,
 Vacancy.CustomAccess, Department
 .Name, Vacancy.PositionNo, Vacancy.Template
 UNION
 SELECT distinct(Vacancy.ID), Vacancy.JobTitle,
 Vacancy.DateCreated, Vacancy.CustomAccess
 , Department.Name as Department, list(Occupation.Name) as
 Occupation, Vacancy.PositionNo
 , Vacancy.Template from 
Vacancy
 LEFT JOIN CategoryOption_TableRow as c_22 
  ON (c_22.RowID = Vacancy.ID 
  and c_22.Category_TableID = 22)
 LEFT JOIN CategoryOption as Occupation
  ON (Occupation.ID = c_22.CategoryOptionID)
 LEFT JOIN TableRow_TableRow as t_33 
   ON (t_33.Table1RowID = Vacancy.ID 
   and t_33.Table_TableID = 33 )
 LEFT JOIN Department
  ON (Department.ID = t_33.Table2RowID and
 Department.Active = 't' and Department
 .ClientID = 263)
 JOIN CategoryOption_TableRow as c_50 ON (c_50.Category_TableID= 50
 and c_50.RowID = Vacancy
 .ID and c_50.CategoryOptionID=19205)
 WHERE Vacancy.ClientID = 263 AND Vacancy.ID NOT IN(SELECT
 

Re: [PERFORM] Slow Query

2015-08-11 Thread Venkata Balaji N
On Wed, Aug 12, 2015 at 12:34 PM, robbyc robcampbel...@gmail.com wrote:

 Hi,

 I am new to optimizing queries and i'm getting a slow running time
 (~1.5secs) with the following SQL:

 SELECT distinct(Vacancy.ID), Vacancy.JobTitle,
 Vacancy.DateCreated, Vacancy.CustomAccess
 , Department.Name as Department, list(Occupation.Name) as
 Occupation, Vacancy.PositionNo
 , Vacancy.Template from
Vacancy
 LEFT JOIN CategoryOption_TableRow as c_22
  ON (c_22.RowID = Vacancy.ID
  and c_22.Category_TableID = 22)
 LEFT JOIN CategoryOption as Occupation
  ON (Occupation.ID = c_22.CategoryOptionID)
 LEFT JOIN TableRow_TableRow as t_33
   ON (t_33.Table1RowID = Vacancy.ID
   and t_33.Table_TableID = 33 )
 LEFT JOIN Department
  ON (Department.ID = t_33.Table2RowID and
 Department.Active = 't' and Department
 .ClientID = 263)
 JOIN CategoryOption_TableRow as c_50 ON (c_50.Category_TableID= 50
 and c_50.RowID = Vacancy
 .ID and c_50.CategoryOptionID=19205)
 WHERE Vacancy.ClientID = 263 AND Vacancy.ID NOT IN(SELECT
 DISTINCT(Vacancy.ID)
 FROM Vacancy join CategoryOption_TableRow ct126 on
 (ct126.Category_TableID = 126
  and RowID = Vacancy.ID)
 left join Workflow on (Workflow.VacancyID =
 Vacancy.ID
 and Workflow.Level
 = 1)
 left join CategoryOption_TableRow c30 on (c30.RowID =
 Workflow.ID and c30.Category_TableID
  = 30 and c30.CategoryOptionID = 21923)
 WHERE Template AND ct126.CategoryOptionID
 IN(34024,35254,35255,35256)) and Vacancy
 .Template = 't'
 GROUP BY Vacancy.ID, Vacancy.JobTitle, Vacancy.DateCreated,
 Vacancy.CustomAccess, Department
 .Name, Vacancy.PositionNo, Vacancy.Template
 UNION
 SELECT distinct(Vacancy.ID), Vacancy.JobTitle,
 Vacancy.DateCreated, Vacancy.CustomAccess
 , Department.Name as Department, list(Occupation.Name) as
 Occupation, Vacancy.PositionNo
 , Vacancy.Template from
Vacancy
 LEFT JOIN CategoryOption_TableRow as c_22
  ON (c_22.RowID = Vacancy.ID
  and c_22.Category_TableID = 22)
 LEFT JOIN CategoryOption as Occupation
  ON (Occupation.ID = c_22.CategoryOptionID)
 LEFT JOIN TableRow_TableRow as t_33
   ON (t_33.Table1RowID = Vacancy.ID
   and t_33.Table_TableID = 33 )
 LEFT JOIN Department
  ON (Department.ID = t_33.Table2RowID and
 Department.Active = 't' and Department
 .ClientID = 263)
 JOIN CategoryOption_TableRow as c_50 ON (c_50.Category_TableID= 50
 and c_50.RowID = Vacancy
 .ID and c_50.CategoryOptionID=19205)
 WHERE Vacancy.ClientID = 263 AND Vacancy.ID NOT IN(SELECT
 DISTINCT(Vacancy.ID)
 FROM Vacancy join CategoryOption_TableRow ct126 on
 (ct126.Category_TableID = 126
  and RowID = Vacancy.ID)
 left join Workflow on (Workflow.VacancyID =
 Vacancy.ID
 and Workflow.Level
 = 1)
 left join CategoryOption_TableRow c30 on (c30.RowID =
 Workflow.ID and c30.Category_TableID
  = 30 and c30.CategoryOptionID = 21923)
 WHERE Template AND ct126.CategoryOptionID
 IN(34024,35254,35255,35256))  and Vacancy
 .Template  't' AND Vacancy.Level = 1
 GROUP BY Vacancy.ID, Vacancy.JobTitle, Vacancy.DateCreated,
 Vacancy.CustomAccess, Department
 .Name, Vacancy.PositionNo, Vacancy.Template
  ORDER BY JobTitle

 Running explain analyze gives me the following information:
 http://explain.depesz.com/s/pdC http://explain.depesz.com/s/pdC


 For a total runtime: 2877.157 ms

 If i remove the left joins on Department and TableRow_TableRow this reduces
 the run time by about a third.
 Additionally removing CategoryOption and CategoryOption_TableRow joins
 further reduces by a about a third.

 Given that i need both these joins for the information retrieved by them,
 what would be the best way to re-factor this query so it runs faster?

 Looking at the output of explain analyze the hash aggregates and sort seem
 to be the primary issue.


The query has got a distinct and group-by/order-by clauses which seems to
be taking time. Without looking at much details of the query code and Table
size etc, did you try increasing the work_mem and then execute the query
and see if that helps ? This will reduce the on-disk IO for sorting. Also,
Vacancy.JobTitle seems to be a non-index column.

Regards,
Venkata Balaji

Fujitsu Australia


[PERFORM] Slow Query

2015-08-11 Thread robbyc
Hi,

I am new to optimizing queries and i'm getting a slow running time
(~1.5secs) with the following SQL:

SELECT distinct(Vacancy.ID), Vacancy.JobTitle,
Vacancy.DateCreated, Vacancy.CustomAccess
, Department.Name as Department, list(Occupation.Name) as
Occupation, Vacancy.PositionNo
, Vacancy.Template from 
   Vacancy
LEFT JOIN CategoryOption_TableRow as c_22 
 ON (c_22.RowID = Vacancy.ID 
 and c_22.Category_TableID = 22)
LEFT JOIN CategoryOption as Occupation
 ON (Occupation.ID = c_22.CategoryOptionID)
LEFT JOIN TableRow_TableRow as t_33 
  ON (t_33.Table1RowID = Vacancy.ID 
  and t_33.Table_TableID = 33 )
LEFT JOIN Department
 ON (Department.ID = t_33.Table2RowID and
Department.Active = 't' and Department
.ClientID = 263)
JOIN CategoryOption_TableRow as c_50 ON (c_50.Category_TableID= 50
and c_50.RowID = Vacancy
.ID and c_50.CategoryOptionID=19205)
WHERE Vacancy.ClientID = 263 AND Vacancy.ID NOT IN(SELECT
DISTINCT(Vacancy.ID) 
FROM Vacancy join CategoryOption_TableRow ct126 on
(ct126.Category_TableID = 126
 and RowID = Vacancy.ID) 
left join Workflow on (Workflow.VacancyID = Vacancy.ID
and Workflow.Level 
= 1) 
left join CategoryOption_TableRow c30 on (c30.RowID =
Workflow.ID and c30.Category_TableID
 = 30 and c30.CategoryOptionID = 21923) 
WHERE Template AND ct126.CategoryOptionID
IN(34024,35254,35255,35256)) and Vacancy
.Template = 't'
GROUP BY Vacancy.ID, Vacancy.JobTitle, Vacancy.DateCreated,
Vacancy.CustomAccess, Department
.Name, Vacancy.PositionNo, Vacancy.Template
UNION
SELECT distinct(Vacancy.ID), Vacancy.JobTitle,
Vacancy.DateCreated, Vacancy.CustomAccess
, Department.Name as Department, list(Occupation.Name) as
Occupation, Vacancy.PositionNo
, Vacancy.Template from 
   Vacancy
LEFT JOIN CategoryOption_TableRow as c_22 
 ON (c_22.RowID = Vacancy.ID 
 and c_22.Category_TableID = 22)
LEFT JOIN CategoryOption as Occupation
 ON (Occupation.ID = c_22.CategoryOptionID)
LEFT JOIN TableRow_TableRow as t_33 
  ON (t_33.Table1RowID = Vacancy.ID 
  and t_33.Table_TableID = 33 )
LEFT JOIN Department
 ON (Department.ID = t_33.Table2RowID and
Department.Active = 't' and Department
.ClientID = 263)
JOIN CategoryOption_TableRow as c_50 ON (c_50.Category_TableID= 50
and c_50.RowID = Vacancy
.ID and c_50.CategoryOptionID=19205)
WHERE Vacancy.ClientID = 263 AND Vacancy.ID NOT IN(SELECT
DISTINCT(Vacancy.ID) 
FROM Vacancy join CategoryOption_TableRow ct126 on
(ct126.Category_TableID = 126
 and RowID = Vacancy.ID) 
left join Workflow on (Workflow.VacancyID = Vacancy.ID
and Workflow.Level 
= 1) 
left join CategoryOption_TableRow c30 on (c30.RowID =
Workflow.ID and c30.Category_TableID
 = 30 and c30.CategoryOptionID = 21923) 
WHERE Template AND ct126.CategoryOptionID
IN(34024,35254,35255,35256))  and Vacancy
.Template  't' AND Vacancy.Level = 1 
GROUP BY Vacancy.ID, Vacancy.JobTitle, Vacancy.DateCreated,
Vacancy.CustomAccess, Department
.Name, Vacancy.PositionNo, Vacancy.Template
 ORDER BY JobTitle

Running explain analyze gives me the following information: 
http://explain.depesz.com/s/pdC http://explain.depesz.com/s/pdC  

For a total runtime: 2877.157 ms

If i remove the left joins on Department and TableRow_TableRow this reduces
the run time by about a third.
Additionally removing CategoryOption and CategoryOption_TableRow joins
further reduces by a about a third.

Given that i need both these joins for the information retrieved by them,
what would be the best way to re-factor this query so it runs faster?

Looking at the output of explain analyze the hash aggregates and sort seem
to be the primary issue.

Thanks in advance  










--
View this message in context: 
http://postgresql.nabble.com/Slow-Query-tp5861835.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Query

2015-08-11 Thread robbyc
Hi Venkata,

work_mem was set to 72MB, increased to 144MB, no change.

Added an index of type varchar_pattern_ops to Vacancy.JobTitle, this did
not help either.

On Wed, Aug 12, 2015 at 2:09 PM, Venkata Balaji N [via PostgreSQL] 
ml-node+s1045698n5861839...@n5.nabble.com wrote:

 On Wed, Aug 12, 2015 at 12:34 PM, robbyc [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=5861839i=0 wrote:

 Hi,

 I am new to optimizing queries and i'm getting a slow running time
 (~1.5secs) with the following SQL:

 SELECT distinct(Vacancy.ID), Vacancy.JobTitle,
 Vacancy.DateCreated, Vacancy.CustomAccess
 , Department.Name as Department, list(Occupation.Name) as
 Occupation, Vacancy.PositionNo
 , Vacancy.Template from
Vacancy
 LEFT JOIN CategoryOption_TableRow as c_22
  ON (c_22.RowID = Vacancy.ID
  and c_22.Category_TableID = 22)
 LEFT JOIN CategoryOption as Occupation
  ON (Occupation.ID = c_22.CategoryOptionID)
 LEFT JOIN TableRow_TableRow as t_33
   ON (t_33.Table1RowID = Vacancy.ID
   and t_33.Table_TableID = 33 )
 LEFT JOIN Department
  ON (Department.ID = t_33.Table2RowID and
 Department.Active = 't' and Department
 .ClientID = 263)
 JOIN CategoryOption_TableRow as c_50 ON (c_50.Category_TableID= 50
 and c_50.RowID = Vacancy
 .ID and c_50.CategoryOptionID=19205)
 WHERE Vacancy.ClientID = 263 AND Vacancy.ID NOT IN(SELECT
 DISTINCT(Vacancy.ID)
 FROM Vacancy join CategoryOption_TableRow ct126 on
 (ct126.Category_TableID = 126
  and RowID = Vacancy.ID)
 left join Workflow on (Workflow.VacancyID =
 Vacancy.ID
 and Workflow.Level
 = 1)
 left join CategoryOption_TableRow c30 on (c30.RowID =
 Workflow.ID and c30.Category_TableID
  = 30 and c30.CategoryOptionID = 21923)
 WHERE Template AND ct126.CategoryOptionID
 IN(34024,35254,35255,35256)) and Vacancy
 .Template = 't'
 GROUP BY Vacancy.ID, Vacancy.JobTitle, Vacancy.DateCreated,
 Vacancy.CustomAccess, Department
 .Name, Vacancy.PositionNo, Vacancy.Template
 UNION
 SELECT distinct(Vacancy.ID), Vacancy.JobTitle,
 Vacancy.DateCreated, Vacancy.CustomAccess
 , Department.Name as Department, list(Occupation.Name) as
 Occupation, Vacancy.PositionNo
 , Vacancy.Template from
Vacancy
 LEFT JOIN CategoryOption_TableRow as c_22
  ON (c_22.RowID = Vacancy.ID
  and c_22.Category_TableID = 22)
 LEFT JOIN CategoryOption as Occupation
  ON (Occupation.ID = c_22.CategoryOptionID)
 LEFT JOIN TableRow_TableRow as t_33
   ON (t_33.Table1RowID = Vacancy.ID
   and t_33.Table_TableID = 33 )
 LEFT JOIN Department
  ON (Department.ID = t_33.Table2RowID and
 Department.Active = 't' and Department
 .ClientID = 263)
 JOIN CategoryOption_TableRow as c_50 ON (c_50.Category_TableID= 50
 and c_50.RowID = Vacancy
 .ID and c_50.CategoryOptionID=19205)
 WHERE Vacancy.ClientID = 263 AND Vacancy.ID NOT IN(SELECT
 DISTINCT(Vacancy.ID)
 FROM Vacancy join CategoryOption_TableRow ct126 on
 (ct126.Category_TableID = 126
  and RowID = Vacancy.ID)
 left join Workflow on (Workflow.VacancyID =
 Vacancy.ID
 and Workflow.Level
 = 1)
 left join CategoryOption_TableRow c30 on (c30.RowID =
 Workflow.ID and c30.Category_TableID
  = 30 and c30.CategoryOptionID = 21923)
 WHERE Template AND ct126.CategoryOptionID
 IN(34024,35254,35255,35256))  and Vacancy
 .Template  't' AND Vacancy.Level = 1
 GROUP BY Vacancy.ID, Vacancy.JobTitle, Vacancy.DateCreated,
 Vacancy.CustomAccess, Department
 .Name, Vacancy.PositionNo, Vacancy.Template
  ORDER BY JobTitle

 Running explain analyze gives me the following information:
 http://explain.depesz.com/s/pdC http://explain.depesz.com/s/pdC


 For a total runtime: 2877.157 ms

 If i remove the left joins on Department and TableRow_TableRow this
 reduces
 the run time by about a third.
 Additionally removing CategoryOption and CategoryOption_TableRow joins
 further reduces by a about a third.

 Given that i need both these joins for the information retrieved by them,
 what would be the best way to re-factor this query so it runs faster?

 Looking at the output of explain analyze the hash aggregates and sort seem
 to be the primary issue.


 The query has got a distinct and group-by/order-by clauses which seems to
 be taking time. Without looking at much details of the query code and Table
 size etc, did you try increasing the work_mem and then execute the query
 and see if that helps ? This will reduce the on-disk IO for sorting. Also,
 Vacancy.JobTitle seems to be a non-index column.

 Regards,
 Venkata Balaji

 Fujitsu Australia



 --
 If you reply to this email, your message will be added to the discussion
 below:
 http://postgresql.nabble.com/Slow-Query-tp5861835p5861839.html
 To unsubscribe from Slow Query, click here
 http://postgresql.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=5861835code=cm9iY2FtcGJlbGw3M0BnbWFpbC5jb218NTg2MTgzNXwxOTc1MDc2ODM4
 .
 

Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-24 Thread Mark Kirkwood
On 24/06/15 09:05, Jim Nasby wrote:
 On 6/19/15 9:57 AM, Ian Pushee wrote:


 On 6/19/2015 10:47 AM, Andreas Kretschmer wrote:
 Explain Analyze outputs (links as requested):
 Default plan: http://explain.depesz.com/s/ib3k
 Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP

 Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM
 All pgsql settings are at their defaults.
 increase work_mem. per session via set work_mem  = 'xxxMB'; or in
 postgresql.conf, reload.



 Hi Andreas,

 The number of rows in the events table isn't constrained, so
 unfortunately it isn't feasible to set work_mem high enough to allow an
 in-memory sort. Forcing the planner to use the index works to produce a
 fast query, so I'm wondering if there is a more general way to getting
 the planner to take into account that work_mem isn't big enough to fit
 the query which will result in a MUCH more costly external merge.
 
 What Andreas is saying is the reason the sort is so expensive is because
 it spilled to disk. If you don't have enough memory to do the sort
 in-memory, then you probably don't have enough memory to buffer the
 table either, which means the index scan is going to be a LOT more
 expensive than a sort.
 
 That said, the better your IO system is the lower you need to set
 random_page_cost. With a good raid setup 2.0 is a good starting point,
 and I've run as low as 1.1. I've never run a system on all SSD, but I've
 heard others recommend setting it as low as 1.0 on an all SSD setup.
 
 It's also worth noting that there's some consensus that the optimizer is
 generally too eager to switch from an index scan to a seqscan.


Mind you, this eagerness could be caused by the OP having
effective_cache_size set to the default. This should be changed (set to
a few GB...)!

Cheers

Mark


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-23 Thread Jim Nasby

On 6/19/15 9:57 AM, Ian Pushee wrote:



On 6/19/2015 10:47 AM, Andreas Kretschmer wrote:

Explain Analyze outputs (links as requested):
Default plan: http://explain.depesz.com/s/ib3k
Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP

Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM
All pgsql settings are at their defaults.

increase work_mem. per session via set work_mem  = 'xxxMB'; or in
postgresql.conf, reload.




Hi Andreas,

The number of rows in the events table isn't constrained, so
unfortunately it isn't feasible to set work_mem high enough to allow an
in-memory sort. Forcing the planner to use the index works to produce a
fast query, so I'm wondering if there is a more general way to getting
the planner to take into account that work_mem isn't big enough to fit
the query which will result in a MUCH more costly external merge.


What Andreas is saying is the reason the sort is so expensive is because 
it spilled to disk. If you don't have enough memory to do the sort 
in-memory, then you probably don't have enough memory to buffer the 
table either, which means the index scan is going to be a LOT more 
expensive than a sort.


That said, the better your IO system is the lower you need to set 
random_page_cost. With a good raid setup 2.0 is a good starting point, 
and I've run as low as 1.1. I've never run a system on all SSD, but I've 
heard others recommend setting it as low as 1.0 on an all SSD setup.


It's also worth noting that there's some consensus that the optimizer is 
generally too eager to switch from an index scan to a seqscan.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Ian Pushee

Hi Folks,

This is my first time posting here, so hopefully I manage to convey all 
the information needed.
We have a simple query that just started giving us problems in 
production when the number of rows gets too large (100k).
The issue seems to be that the planner wants to sort the rows using a 
sequential scan, rather than the index provided specifically for this 
query. This isn't a problem with low numbers of rows, but eventually the 
query outgrows work_mem and uses the disk, slowing does the query 
greatly. I know the common answer is to increase work_mem... but since 
this tables growth is unpredictable, that isn't a viable strategy.
I've tried increasing shared_buffers and effective_cache_size, but that 
doesn't appear to effect the plan chosen here. Setting 
random_page_cost=1.0 works, but I'm hoping for a more general solution 
that doesn't require setting that locally each time I run the query. I 
guess my real question is wether or not there is any way to get the 
planner to take into account the fact that it's going to need to do an 
'external merge', and that it is going to take a LONG time?


Table and Index Schemas:
CREATE TABLE events
(
  id serial NOT NULL,
  name character varying(64),
  eventspy_id integer NOT NULL,
  camera_id integer NOT NULL,
  start_time timestamp without time zone NOT NULL,
  millisecond smallint NOT NULL,
  uid smallint NOT NULL,
  update_time timestamp without time zone NOT NULL DEFAULT now(),
  length integer NOT NULL,
  objects text NOT NULL,
  priority smallint NOT NULL,
  type character varying(45) NOT NULL DEFAULT 'alarm'::character varying,
  status event_status NOT NULL DEFAULT 'new'::event_status,
  confidence smallint NOT NULL DEFAULT 100::smallint,
  CONSTRAINT events_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX events_confidnce
  ON events
  USING btree
  (confidence);

CREATE INDEX events_summary
  ON events
  USING btree
  (name COLLATE pg_catalog.default, eventspy_id, camera_id, type 
COLLATE pg_catalog.default, status);


Query:
SELECT name, type, eventspy_id, camera_id, status, COUNT(id), 
MAX(update_time), MIN(start_time), MAX(start_time) FROM events WHERE 
confidence=0 GROUP BY name, eventspy_id, camera_id, type, status;


Explain Analyze outputs (links as requested):
Default plan: http://explain.depesz.com/s/ib3k
Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP

Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM
All pgsql settings are at their defaults.

Thanks for any help you can provide,
-Ian Pushee



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Ian Pushee



On 6/19/2015 10:46 AM, Igor Neyman wrote:


Probably events_confidnce index is not very selective, that's why optimizer 
prefers seq scan.
I'd try to create an index on (name, eventspy_id, camera_id, type, status).

Also, the recent 9.2 is 9.2.13, you should upgrade.

Regards,
Igor Neyman


Hi Igor,

I already have an index for (name, eventspy_id, camera_id, type, 
status)... that is the index being used (apparently silently) when I set 
random_page_cost=1.0.


Thanks,
-Ian


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Ian Pushee



On 6/19/2015 10:47 AM, Andreas Kretschmer wrote:

Explain Analyze outputs (links as requested):
Default plan: http://explain.depesz.com/s/ib3k
Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP

Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM
All pgsql settings are at their defaults.

increase work_mem. per session via set work_mem  = 'xxxMB'; or in
postgresql.conf, reload.




Hi Andreas,

The number of rows in the events table isn't constrained, so 
unfortunately it isn't feasible to set work_mem high enough to allow an 
in-memory sort. Forcing the planner to use the index works to produce a 
fast query, so I'm wondering if there is a more general way to getting 
the planner to take into account that work_mem isn't big enough to fit 
the query which will result in a MUCH more costly external merge.


Thanks,
-Ian


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Friday, June 19, 2015 11:07 AM
To: Ian Pushee; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query (planner insisting on using 'external merge' 
sort type)



-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Ian Pushee
Sent: Friday, June 19, 2015 10:54 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query (planner insisting on using 'external merge' 
sort type)



On 6/19/2015 10:46 AM, Igor Neyman wrote:

 Probably events_confidnce index is not very selective, that's why optimizer 
 prefers seq scan.
 I'd try to create an index on (name, eventspy_id, camera_id, type, status).

 Also, the recent 9.2 is 9.2.13, you should upgrade.

 Regards,
 Igor Neyman

Hi Igor,

I already have an index for (name, eventspy_id, camera_id, type, status)... 
that is the index being used (apparently silently) when I set 
random_page_cost=1.0.

Thanks,
-Ian


--

Well, having 8GB Ram on the machine you probably should not be using default 
config parameters.
Depending on what else is this machine is being used for, and depending on 
queries you are running, you should definitely modify Postgres config.
If this machine is designated database server, I'd start with the following 
parameters modified from default values:

shared_buffers = 1024MB
temp_buffers = 8MB
work_mem = 64MB 
effective_cache_size = 1024MB
random_page_cost = 2.5
cpu_tuple_cost = 0.03
cpu_index_tuple_cost = 0.05

and see how it goes.

Regards,
Igor Neyman

---

Oops, should be at least:

effective_cache_size = 5120MB

on dedicated server.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)

2015-06-19 Thread Igor Neyman


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Ian Pushee
Sent: Friday, June 19, 2015 10:54 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query (planner insisting on using 'external merge' 
sort type)



On 6/19/2015 10:46 AM, Igor Neyman wrote:

 Probably events_confidnce index is not very selective, that's why optimizer 
 prefers seq scan.
 I'd try to create an index on (name, eventspy_id, camera_id, type, status).

 Also, the recent 9.2 is 9.2.13, you should upgrade.

 Regards,
 Igor Neyman

Hi Igor,

I already have an index for (name, eventspy_id, camera_id, type, status)... 
that is the index being used (apparently silently) when I set 
random_page_cost=1.0.

Thanks,
-Ian


--

Well, having 8GB Ram on the machine you probably should not be using default 
config parameters.
Depending on what else is this machine is being used for, and depending on 
queries you are running, you should definitely modify Postgres config.
If this machine is designated database server, I'd start with the following 
parameters modified from default values:

shared_buffers = 1024MB
temp_buffers = 8MB
work_mem = 64MB 
effective_cache_size = 1024MB
random_page_cost = 2.5
cpu_tuple_cost = 0.03
cpu_index_tuple_cost = 0.05

and see how it goes.

Regards,
Igor Neyman




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated

2015-06-15 Thread Merlin Moncure
On Thu, Jun 11, 2015 at 7:18 PM, Sasa Vilic sasavi...@gmail.com wrote:
 Hi,

 I have a query that takes ridiculously long to complete (over 500ms) but if
 I disable nested loop it does it really fast (24.5ms)

 Here are links for
 * first request (everything enabled): http://explain.depesz.com/s/Q1M
 * second request (nested loop disabled): http://explain.depesz.com/s/9ZY

 I have also noticed, that setting

 set join_collapse_limit = 1;

 produces similar results as when nested loops are disabled.

 Autovacuumm is running, and I did manually performed both: analyze and
 vacuumm analyze. No effect.

 I tried increasing statistics for columns (slot, path_id, key) to 5000 for
 table data. No effect.

 I tried increasing statistics for columns (id, parent, key) to 5000 for
 table path. No effect.

 I can see, that postgres is doing wrong estimation on request count, but I
 can't figure it out why.

 Table path is used to represent tree-like structure.

 == QUERY ==

 SELECT p1.value as request_type, p2.value as app_id, p3.value as app_ip,
 p3.id as id, data.*, server.name
 FROM data
 INNER JOIN path p3 ON data.path_id = p3.id
 INNER JOIN server on data.server_id = server.id
 INNER JOIN path p2 on p2.id = p3.parent
 INNER JOIN path p1 on p1.id = p2.parent
 WHERE data.slot between '2015-02-18 00:00:00' and '2015-02-19 00:00:00'
 AND p1.key = 'request_type' AND p2.key = 'app_id' AND p3.key = 'app_ip'
 ;

 == TABLES ==
  Table public.path
  Column | Type  | Modifiers | Storage  |
 Description
 +---+---+--+-
  id | integer   | not null default
 nextval('path_id_seq'::regclass) | plain|
  parent | integer |   |
 plain|
  key| character varying(25) | not null
 | extended |
  value  | character varying(50) | not null
 | extended |
 Indexes:
 path_pkey PRIMARY KEY, btree (id)
 path_unique UNIQUE CONSTRAINT, btree (parent, key, value)
 Foreign-key constraints:
 path.fg.parent-path(id) FOREIGN KEY (parent) REFERENCES path(id)
 Referenced by:
 TABLE data CONSTRAINT data_fkey_path FOREIGN KEY (path_id)
 REFERENCES path(id)
 TABLE path CONSTRAINT path.fg.parent-path(id) FOREIGN KEY (parent)
 REFERENCES path(id)
 Has OIDs: no

Table public.data
   Column   |  Type  | Modifiers | Storage  |
 Description
 ---++---+--+-
  slot  | timestamp(0) without time zone | not null  | plain|
  server_id | integer| not null  | plain|
  path_id   | integer| not null  | plain|
  key   | character varying(50)  | not null  | extended |
  value | real   | not null  | plain|
 Indexes:
 data_pkey PRIMARY KEY, btree (slot, server_id, path_id, key)
 Foreign-key constraints:
 data_fkey_path FOREIGN KEY (path_id) REFERENCES path(id)
 Has OIDs: no

 svilic= select count(*) from path;
  count
 ---
603

 svilic= select count(*) from path p1 inner join path p2 on p1.id =
 p2.parent inner join path p3 on p2.id = p3.parent where p1.parent is null;
  count
 ---
463

 svilic= select count(*) from server;
  count
 ---
 37

 svilic= select count(*) from data;
   count
 --
  23495552


 svilic= select version();
 version
 -
  PostgreSQL 9.1.17 on x86_64-unknown-linux-gnu, compiled by gcc
 (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

 == SERVER CONFIGURATION ==

 shared_buffers = 512MB
 work_mem = 8MB (I have tried changing it to 32, 128 and 512, no effect)
 maintenance_work_mem = 64MB
 checkpoint_segments = 100
 random_page_cost = 4.0
 effective_cache_size = 3072MB

 == HARDWARE CONFIGURATION ==

 cpu: Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz (4 cores)
 mem: 8GB
 system is using regular disks, (no raid and no ssd)

huh.  the query looks pretty clean  (except for possible overuse of
surrogate keys which tend to exacerbate planning issues in certain
cases).

Let's try cranking statistics on data.path_id, first to 1000 and then
to 1 and see how it affects the plan.   The database is clearly
misestimating row counts on that join.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query - lots of temporary files.

2015-06-12 Thread Johann Spies
On 10 June 2015 at 16:50, Tomas Vondra tomas.von...@2ndquadrant.com wrote:



 The problematic piece of the explain plan is this:

  -  Merge Join  (cost=4384310.92..21202716.78 rows=6664163593
   width=390)
Output: a.ut, c.gt, b.go, b.gn, d.au
Merge Cond: ((c.ut)::text = (d.rart_id)::text)

 That is, the planner expects ~6.7 billion rows, each ~390B wide. That's
 ~2.5TB of data that needs to be stored to disk (so that the sort can
 process it).

 The way the schema is designed might be one of the issues - ISTM the 'ut'
 column is somehow universal, mixing values referencing different columns in
 multiple tables. Not only that's utterly misleading for the planner (and
 may easily cause issues with huge intermediate results), but it also makes
 formulating the queries very difficult. And of course, the casting between
 text and int is not very good either.

 Fix the schema to follow relational best practices - separate the values
 into multiple columns, and most of this will go away.


Thanks for your reply Tomas.

I do not understand what the problem with the 'ut' column is.  It is a
unique identifier in the first table(africa_uts) and is used in the other
tables to establish joins and does have the same type definition in all the
tables.  Is the problem in the similar name.  The data refers in all the
'ut' columns of the different tables to the same data.  I do not casting of
integers into text in this case.  I don't know why the planner is doing
it.  The field 'rart_id' in isi.rauthor is just another name for 'ut' in
the other tables and have the same datatype.

I do not understand your remark: separate the values into multiple
columns. I cannot see which values can be separated into different columns
in the schema.  Do you mean in the query?  Why?


Johann
-- 
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)


[PERFORM] Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated

2015-06-11 Thread Sasa Vilic

Hi,

I have a query that takes ridiculously long to complete (over 500ms) but 
if I disable nested loop it does it really fast (24.5ms)


Here are links for
* first request (everything enabled): http://explain.depesz.com/s/Q1M
* second request (nested loop disabled): http://explain.depesz.com/s/9ZY

I have also noticed, that setting

set join_collapse_limit = 1;

produces similar results as when nested loops are disabled.

Autovacuumm is running, and I did manually performed both: analyze and 
vacuumm analyze. No effect.


I tried increasing statistics for columns (slot, path_id, key) to 5000 
for table data. No effect.


I tried increasing statistics for columns (id, parent, key) to 5000 for 
table path. No effect.


I can see, that postgres is doing wrong estimation on request count, but 
I can't figure it out why.


Table path is used to represent tree-like structure.

== QUERY ==

SELECT p1.value as request_type, p2.value as app_id, p3.value as app_ip, 
p3.id as id, data.*, server.name

FROM data
INNER JOIN path p3 ON data.path_id = p3.id
INNER JOIN server on data.server_id = server.id
INNER JOIN path p2 on p2.id = p3.parent
INNER JOIN path p1 on p1.id = p2.parent
WHERE data.slot between '2015-02-18 00:00:00' and '2015-02-19 00:00:00'
AND p1.key = 'request_type' AND p2.key = 'app_id' AND p3.key = 'app_ip'
;

== TABLES ==
 Table public.path
 Column | Type  | Modifiers | 
Storage  | Description

+---+---+--+-
 id | integer   | not null default 
nextval('path_id_seq'::regclass) | plain|
 parent | integer |   | 
plain|
 key| character varying(25) | not 
null  | extended |
 value  | character varying(50) | not 
null  | extended |

Indexes:
path_pkey PRIMARY KEY, btree (id)
path_unique UNIQUE CONSTRAINT, btree (parent, key, value)
Foreign-key constraints:
path.fg.parent-path(id) FOREIGN KEY (parent) REFERENCES path(id)
Referenced by:
TABLE data CONSTRAINT data_fkey_path FOREIGN KEY (path_id) 
REFERENCES path(id)
TABLE path CONSTRAINT path.fg.parent-path(id) FOREIGN KEY 
(parent) REFERENCES path(id)

Has OIDs: no

   Table public.data
  Column   |  Type  | Modifiers | Storage  | 
Description

---++---+--+-
 slot  | timestamp(0) without time zone | not null  | plain|
 server_id | integer| not null  | plain|
 path_id   | integer| not null  | plain|
 key   | character varying(50)  | not null  | extended |
 value | real   | not null  | plain|
Indexes:
data_pkey PRIMARY KEY, btree (slot, server_id, path_id, key)
Foreign-key constraints:
data_fkey_path FOREIGN KEY (path_id) REFERENCES path(id)
Has OIDs: no

svilic= select count(*) from path;
 count
---
   603

svilic= select count(*) from path p1 inner join path p2 on p1.id = 
p2.parent inner join path p3 on p2.id = p3.parent where p1.parent is null;

 count
---
   463

svilic= select count(*) from server;
 count
---
37

svilic= select count(*) from data;
  count
--
 23495552


svilic= select version();
version
-
 PostgreSQL 9.1.17 on x86_64-unknown-linux-gnu, compiled by gcc 
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit


== SERVER CONFIGURATION ==

shared_buffers = 512MB
work_mem = 8MB (I have tried changing it to 32, 128 and 512, no effect)
maintenance_work_mem = 64MB
checkpoint_segments = 100
random_page_cost = 4.0
effective_cache_size = 3072MB

== HARDWARE CONFIGURATION ==

cpu: Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz (4 cores)
mem: 8GB
system is using regular disks, (no raid and no ssd)





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query - lots of temporary files.

2015-06-10 Thread Johann Spies
I have stopped this query after about 16 hours.  At the same time I ran a
'explain analyze' on the same query to find out why it took so long.  These
two processes generated temporary files of 173GB in
/var/lib/postgresql/9.4/main/base/pgsql_tmp.

COPY
  (SELECT A.ut,
  B.go AS funding_org,
  B.gn AS grant_no,
  C.gt AS thanks,
  D.au
   FROM isi.funding_text C,
isi.rauthor D,
isi.africa_uts A
   LEFT JOIN isi.funding_org B ON (B.ut = A.ut)
   WHERE (C.ut IS NOT NULL
  OR B.ut IS NOT NULL)
 AND D.rart_id = C.ut
 AND C.ut = B.ut
   GROUP BY A.ut,
GO,
gn,
gt,
au
   ORDER BY funding_org) TO '/tmp/africafunding2.csv' WITH csv quote
'' DELIMITER ',';


A modified version of this query finished in 1min 27 sek:

COPY
  (SELECT 'UT'||A.ut,
B.go AS funding_org,
B.gn AS grant_no,
C.gt AS thanks
   FROM isi.africa_uts A
   LEFT JOIN isi.funding_org B ON (B.ut = A.ut)
   LEFT JOIN isi.funding_text C ON (A.ut = C.ut)
   WHERE (C.ut IS NOT NULL
  OR B.ut IS NOT NULL)
   GROUP BY A.ut,
GO,
gn,
gt) TO '/tmp/africafunding.csv' WITH csv quote '' DELIMITER ',';

As I said, the process of 'explain analyze' of the problematic query
contributed to the 173GB
temporary files and did not finish in about 16 hours.

Just explain of the query part produces this:

Sort  (cost=4781458203.46..4798118612.44 rows=6664163593 width=390)
  Output: a.ut, b.go, b.gn, c.gt, (array_to_string(array_agg(d.au),
';'::text)), b.go, b.gn, d.au
  Sort Key: b.go
  -  GroupAggregate  (cost=2293037801.73..2509623118.51
rows=6664163593 width=390)
Output: a.ut, b.go, b.gn, c.gt,
array_to_string(array_agg(d.au), ';'::text), b.go, b.gn, d.au
Group Key: a.ut, b.go, b.gn, c.gt, d.au
-  Sort  (cost=2293037801.73..2309698210.72 rows=6664163593
width=390)
  Output: a.ut, c.gt, b.go, b.gn, d.au
  Sort Key: a.ut, b.go, b.gn, c.gt, d.au
  -  Merge Join  (cost=4384310.92..21202716.78
rows=6664163593 width=390)
Output: a.ut, c.gt, b.go, b.gn, d.au
Merge Cond: ((c.ut)::text = (d.rart_id)::text)
-  Merge Join  (cost=635890.84..1675389.41
rows=6069238 width=412)
  Output: c.gt, c.ut, a.ut, b.go, b.gn, b.ut
  Merge Cond: ((c.ut)::text = (b.ut)::text)
  Join Filter: ((c.ut IS NOT NULL) OR (b.ut
IS NOT NULL))
  -  Merge Join  (cost=635476.30..675071.77
rows=1150354 width=348)
Output: c.gt, c.ut, a.ut
Merge Cond: ((a.ut)::text = (c.ut)::text)
-  Index Only Scan using
africa_ut_idx on isi.africa_uts a  (cost=0.42..19130.19 rows=628918
width=16)
  Output: a.ut
-  Sort  (cost=632211.00..640735.23
rows=3409691 width=332)
  Output: c.gt, c.ut
  Sort Key: c.ut
  -  Seq Scan on
isi.funding_text c  (cost=0.00..262238.91 rows=3409691 width=332)
Output: c.gt, c.ut
  -  Index Scan using funding_org_ut_idx on
isi.funding_org b  (cost=0.56..912582.50 rows=9835492 width=64)
Output: b.id, b.ut, b.go, b.gn
-  Materialize  (cost=0.57..17914892.46
rows=159086560 width=26)
  Output: d.id, d.rart_id, d.au, d.ro, d.ln,
d.af, d.ras, d.ad, d.aa, d.em, d.ag, d.tsv
  -  Index Scan using rauthor_rart_id_idx on
isi.rauthor d  (cost=0.57..17517176.06 rows=159086560 width=26)
Output: d.id, d.rart_id, d.au, d.ro,
d.ln, d.af, d.ras, d.ad, d.aa, d.em, d.ag, d.tsv

Any idea on why adding the rauthor table in the query is so problematic?

My systerm:

768 GB RAM
shared_ buffers:  32GB
work_mem:  4608MB

Regards
Johann

-- 
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)


Re: [PERFORM] Slow query - lots of temporary files.

2015-06-10 Thread Claudio Freire
On Wed, Jun 10, 2015 at 9:39 AM, Johann Spies johann.sp...@gmail.com wrote:
 COPY
   (SELECT A.ut,
   B.go AS funding_org,
   B.gn AS grant_no,
   C.gt AS thanks,
   D.au
FROM isi.funding_text C,
 isi.rauthor D,
 isi.africa_uts A
LEFT JOIN isi.funding_org B ON (B.ut = A.ut)
WHERE (C.ut IS NOT NULL
   OR B.ut IS NOT NULL)
  AND D.rart_id = C.ut
  AND C.ut = B.ut
GROUP BY A.ut,
 GO,
 gn,
 gt,
 au
ORDER BY funding_org) TO '/tmp/africafunding2.csv' WITH csv quote ''
 DELIMITER ',';


 A modified version of this query finished in 1min 27 sek:

 COPY
   (SELECT 'UT'||A.ut,
 B.go AS funding_org,
 B.gn AS grant_no,
 C.gt AS thanks
FROM isi.africa_uts A
LEFT JOIN isi.funding_org B ON (B.ut = A.ut)
LEFT JOIN isi.funding_text C ON (A.ut = C.ut)
WHERE (C.ut IS NOT NULL
   OR B.ut IS NOT NULL)
GROUP BY A.ut,
 GO,
 gn,
 gt) TO '/tmp/africafunding.csv' WITH csv quote '' DELIMITER
 ',';


 As I said, the process of 'explain analyze' of the problematic query
 contributed to the 173GB
 temporary files and did not finish in about 16 hours.

The joins are different on both versions, and the most likely culprit
is the join against D. It's probably wrong, and the first query is
building a cartesian product.

Without more information about the schema it's difficult to be sure though.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query - lots of temporary files.

2015-06-10 Thread Johann Spies
On 10 June 2015 at 15:02, Claudio Freire klaussfre...@gmail.com wrote:


 The joins are different on both versions, and the most likely culprit
 is the join against D. It's probably wrong, and the first query is
 building a cartesian product.

 Without more information about the schema it's difficult to be sure though.


Thanks for your  reply.  I will experiment futher with different joins.

Here is the schema of the involved tables:

nkb=# \d isi.funding_text
Table isi.funding_text
 Column | Type  |
Modifiers
+---+---
 id | integer   | not null default
nextval('isi.funding_text_id_seq'::regclass)
 ut | character varying(15) |
 gt | citext|
Indexes:
funding_text_pkey PRIMARY KEY, btree (id)
funding_text_ut_idx btree (ut)
Foreign-key constraints:
funding_text_ut_fkey FOREIGN KEY (ut) REFERENCES isi.ritem(ut)

nkb=# \d isi.funding_org
Table isi.funding_org
 Column | Type  |
Modifiers
+---+--
 id | integer   | not null default
nextval('isi.funding_org_id_seq'::regclass)
 ut | character varying(15) |
 go | citext|
 gn | character varying |
Indexes:
funding_org_pkey PRIMARY KEY, btree (id)
funding_org_ut_idx btree (ut)
Foreign-key constraints:
funding_org_ut_fkey FOREIGN KEY (ut) REFERENCES isi.ritem(ut)


Table isi.africa_uts
 Column | Type  |
Modifiers
+---+-
 ut | character varying(15) |
 id | integer   | not null default
nextval('isi.africa_uts_id_seq'::regclass)
Indexes:
africa_uts_pkey PRIMARY KEY, btree (id)
africa_ut_idx btree (ut)
Foreign-key constraints:
africa_uts_ut_fkey FOREIGN KEY (ut) REFERENCES isi.ritem(ut)


 Table isi.rauthor
 Column  |  Type  |
Modifiers
-++--
 id  | integer| not null default
nextval('isi.rauthor_id_seq'::regclass)
 rart_id | character varying(15)  |
 au  | character varying(75)  |
 ro  | character varying(30)  |
 ln  | character varying(200) |
 af  | character varying(200) |
 ras | character varying(4)   |
 ad  | integer|
 aa  | text   |
 em  | character varying(250) |
 ag  | character varying(75)  |
 tsv | tsvector   |
Indexes:
rauthor_pkey PRIMARY KEY, btree (id) CLUSTER
rauthor_ad_idx btree (ad)
rauthor_au_idx btree (au)
rauthor_lower_idx btree (lower(au::text))
rauthor_lower_lower1_idx btree (lower(ln::text), lower(af::text))
rauthor_rart_id_idx btree (rart_id)
rauthor_tsv_idx gin (tsv)
Referenced by:
TABLE level1.person CONSTRAINT person_auth_id_fkey FOREIGN KEY
(auth_id) REFERENCES isi.rauthor(id) ON DELETE CASCADE
Triggers:
tsvectorupdate_for_rauthor BEFORE INSERT OR UPDATE ON isi.rauthor FOR
EACH ROW EXECUTE PROCEDURE isi.update_rauthor_tsv()

Regards
Johann
-- 
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)


Re: [PERFORM] Slow query - lots of temporary files.

2015-06-10 Thread Tomas Vondra



On 06/10/15 15:42, Johann Spies wrote:

On 10 June 2015 at 15:02, Claudio Freire klaussfre...@gmail.com
mailto:klaussfre...@gmail.com wrote:


The joins are different on both versions, and the most likely culprit
is the join against D. It's probably wrong, and the first query is
building a cartesian product.

Without more information about the schema it's difficult to be sure
though.


Thanks for your  reply.  I will experiment futher with different joins.


I don't know what you mean by experimenting with joins - that should 
be determined by the schema.


The problematic piece of the explain plan is this:

 -  Merge Join  (cost=4384310.92..21202716.78 rows=6664163593
  width=390)
   Output: a.ut, c.gt, b.go, b.gn, d.au
   Merge Cond: ((c.ut)::text = (d.rart_id)::text)

That is, the planner expects ~6.7 billion rows, each ~390B wide. That's 
~2.5TB of data that needs to be stored to disk (so that the sort can 
process it).


The way the schema is designed might be one of the issues - ISTM the 
'ut' column is somehow universal, mixing values referencing different 
columns in multiple tables. Not only that's utterly misleading for the 
planner (and may easily cause issues with huge intermediate results), 
but it also makes formulating the queries very difficult. And of course, 
the casting between text and int is not very good either.


Fix the schema to follow relational best practices - separate the values 
into multiple columns, and most of this will go away.



regards

--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query

2015-02-11 Thread Torsten Förtsch
On 11/02/15 07:41, Sathish Nelson wrote:
 am connecting three tables in query. one table have 73000 records
 
 another two tables have 138000 records.
 
 but its take 12 sec for show 12402 rows in tables

you need more work_mem. The query plan shows multiple external sort nodes.

Otherwise, the query and the plan I received here are too badly
formatted for me to comprehend.

Torsten


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query

2015-02-11 Thread Tom Lane
Sathish Nelson satson.cs...@gmail.com writes:
 am connecting three tables in query. one table have 73000 records
 another two tables have 138000 records.
 but its take 12 sec for show 12402 rows in tables

Increasing work_mem would make those sort steps faster ...

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] slow query

2015-02-10 Thread Sathish Nelson
am connecting three tables in query. one table have 73000 records

another two tables have 138000 records.

but its take 12 sec for show 12402 rows in tables

Tables Structure:

Items Table

CREATE TABLE C_SAM_Master.items
(
  itemno integer NOT NULL,
  itemname character varying(250) NOT NULL,
  itemcode character varying(250) NOT NULL,
  shortname character varying(20) NOT NULL,
  aliasname character varying(250) NOT NULL,
  aliasnamelanguage character varying(250) NOT NULL,
  masteritemno integer NOT NULL,
  groupno1 smallint NOT NULL,
  groupno2 smallint NOT NULL,
  groupno3 smallint NOT NULL,
  commodityno smallint NOT NULL,
  unitno smallint NOT NULL,
  weighttype character(1) NOT NULL,
  altunitno smallint NOT NULL,
  weight double precision NOT NULL,
  reqmrp character(1) NOT NULL,
  reqbatch character(1) NOT NULL,
  reqmfrdate character(1) NOT NULL,
  mfrdateformat character varying(20) NOT NULL,
  reqexpdate character(1) NOT NULL,
  expdateformat character varying(20) NOT NULL,
  expdays1 smallint NOT NULL,
  expdays2 character(1) NOT NULL,
  expinfodays smallint NOT NULL,
  stdsaleratemethod smallint NOT NULL,
  salesrateper smallint NOT NULL,
  stdprofit1 double precision NOT NULL,
  stdprofit2 character(1) NOT NULL,
  includestockrep character(1) NOT NULL,
  minstock double precision NOT NULL,
  minstockunit smallint NOT NULL,
  minsaleqtynos double precision NOT NULL,
  minsaleqtyunit smallint NOT NULL,
  minsaleqty double precision NOT NULL,
  description text NOT NULL,
  remarks character varying(250) NOT NULL,
  actpurchaseorder character(1) NOT NULL,
  actpurchase character(1) NOT NULL,
  actpurchasereturn character(1) NOT NULL,
  actsalesorder character(1) NOT NULL,
  actsales character(1) NOT NULL,
  actsalesreturn character(1) NOT NULL,
  actreceiptnote character(1) NOT NULL,
  actdeliverynote character(1) NOT NULL,
  actconsumption character(1) NOT NULL,
  actproduction character(1) NOT NULL,
  actestimate character(1) NOT NULL,
  notifypurchaseorder character varying(250) NOT NULL,
  notifypurchase character varying(250) NOT NULL,
  notifypurchasereturn character varying(250) NOT NULL,
  notifysalesorder character varying(250) NOT NULL,
  notifysales character varying(250) NOT NULL,
  notifysalesreturn character varying(250) NOT NULL,
  notifyreceiptnote character varying(250) NOT NULL,
  notifydeliverynote character varying(250) NOT NULL,
  notifyconsumption character varying(250) NOT NULL,
  notifyproduction character varying(250) NOT NULL,
  notifyestimate character varying(250) NOT NULL,
  act boolean NOT NULL,
  recordowner smallint NOT NULL,
  lastmodified smallint NOT NULL,
  crdate timestamp without time zone NOT NULL,
  stdmaxprofit double precision NOT NULL,
  commodityname character varying(100) NOT NULL,
  lst double precision NOT NULL,
  unittype character(1) NOT NULL,
  unit1 character varying(15) NOT NULL,
  unit2 character varying(15) NOT NULL,
  units integer NOT NULL,
  unitname character varying(50) NOT NULL,
  decimals smallint NOT NULL,
  groupname1 character varying(50) NOT NULL,
  groupname2 character varying(50) NOT NULL,
  groupname3 character varying(50) NOT NULL,
  repgroupname character varying(160) NOT NULL,
  masteritemname character varying(100) NOT NULL,
  altunit1 character varying(15) NOT NULL,
  altunit2 character varying(15) NOT NULL,
  altunits integer NOT NULL,
  altunitname character varying(50) NOT NULL,
  altunitdecimals smallint NOT NULL,
  CONSTRAINT items_itemno_pk PRIMARY KEY (itemno),
  CONSTRAINT items_altunitno_fk FOREIGN KEY (altunitno)
  REFERENCES C_SAM_Master.measureunits (unitno) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE RESTRICT,
  CONSTRAINT items_commodityno_fk FOREIGN KEY (commodityno)
  REFERENCES C_SAM_Master.commodity (commodityno) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE RESTRICT,
  CONSTRAINT items_groupno1_fk FOREIGN KEY (groupno1)
  REFERENCES C_SAM_Master.itemgroup1 (groupno1) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE RESTRICT,
  CONSTRAINT items_groupno2_fk FOREIGN KEY (groupno2)
  REFERENCES C_SAM_Master.itemgroup2 (groupno2) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE RESTRICT,
  CONSTRAINT items_groupno3_fk FOREIGN KEY (groupno3)
  REFERENCES C_SAM_Master.itemgroup3 (groupno3) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE RESTRICT,
  CONSTRAINT items_lastmodified_fk FOREIGN KEY (lastmodified)
  REFERENCES appsetup.user1 (userno) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE RESTRICT,
  CONSTRAINT items_masteritemno_fk FOREIGN KEY (masteritemno)
  REFERENCES C_SAM_Master.masteritems (masteritemno) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE RESTRICT,
  CONSTRAINT items_recordowner_fk FOREIGN KEY (recordowner)
  REFERENCES appsetup.user1 (userno) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE RESTRICT,
  CONSTRAINT items_unitno_fk FOREIGN KEY (unitno)
  REFERENCES C_SAM_Master.measureunits (unitno) MATCH SIMPLE
  ON UPDATE NO ACTION ON 

[PERFORM] Slow query

2014-09-23 Thread Ross Elliott
Maybe someone can explain this. The following SQL will reproduce our issue:
DROP TABLE IF EXISTS t1 CASCADE;
CREATE TABLE t1 (name text,
 state text);
CREATE INDEX t1_name ON t1(name);
CREATE INDEX t1_state ON t1(state);
CREATE INDEX t1_name_state ON t1(name,state);

-- Create some sample data
DO $$
DECLARE
states text[] := array['UNKNOWN', 'TODO', 'DONE', 'UNKNOWN'];
BEGIN
FOR v IN 1..20 LOOP
  INSERT INTO t1 VALUES('user'||v, states[floor(random()*4)]);
  INSERT INTO t1 VALUES('user'||v, states[floor(random()*4)]);
  INSERT INTO t1 VALUES('user'||v, states[floor(random()*4)]);
  INSERT INTO t1 VALUES('user'||v, states[floor(random()*4)]);
END LOOP;
END $$;


CREATE OR REPLACE FUNCTION state_to_int(state character varying) RETURNS
integer
LANGUAGE plpgsql IMMUTABLE STRICT
AS $$BEGIN
IF state = 'UNKNOWN' THEN RETURN 0;
ELSIF state = 'TODO' THEN RETURN 1;
ELSIF state = 'DONE' THEN RETURN 2;
ELSIF state = 'NOT REQUIRED' THEN RETURN 3;
ELSE RAISE EXCEPTION 'state_to_int called with invalid state value';
END IF;
END;$$;

CREATE OR REPLACE FUNCTION int_to_state(state integer) RETURNS character
varying
LANGUAGE plpgsql IMMUTABLE STRICT
AS $$BEGIN
IF state = 0 THEN RETURN 'UNKNOWN';
ELSIF state = 1 THEN RETURN 'TODO';
ELSIF state = 2 THEN RETURN 'DONE';
ELSIF state = 3 THEN RETURN 'NOT REQUIRED';
ELSE RAISE EXCEPTION 'int_to_state called with invalid state value';
END IF;
END;$$;

-- Why is this a lot slower
explain (analyse, buffers) select name,
int_to_state(min(state_to_int(state))) as status from t1 group by t1.name;

-- Than this?
explain (analyze, buffers) select name, (array['UNKNOWN', 'TODO', 'DONE',
'NOT REQUIRED'])[min(
CASE state
WHEN 'UNKNOWN' THEN 0
WHEN 'TODO' THEN 1
WHEN 'DONE' THEN 2
WHEN 'NOT REQUIRED' THEN 3
END)] AS status from t1 group by t1.name;

-- This is also very much slower
explain (analyze, buffers) select name, (array['UNKNOWN', 'TODO', 'DONE',
'NOT REQUIRED'])[min(state_to_int(state))] AS status from t1 group by
t1.name;

This was done on:
PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 64-bit

We get results like this:
QUERY PLAN

---
 GroupAggregate  (cost=0.42..280042.62 rows=208120 width=15) (actual
time=0.076..2439.066 rows=20 loops=1)
   Buffers: shared hit=53146
   -  Index Scan using t1_name on t1  (cost=0.42..21931.42 rows=80
width=15) (actual time=0.009..229.477 rows=80 loops=1)
 Buffers: shared hit=53146
 Total runtime: 2460.860 ms
(5 rows)

QUERY PLAN

---
 GroupAggregate  (cost=0.42..36012.62 rows=208120 width=15) (actual
time=0.017..559.384 rows=20 loops=1)
   Buffers: shared hit=53146
   -  Index Scan using t1_name on t1  (cost=0.42..21931.42 rows=80
width=15) (actual time=0.008..197.133 rows=80 loops=1)
 Buffers: shared hit=53146
 Total runtime: 574.550 ms
(5 rows)

QUERY PLAN

---
 GroupAggregate  (cost=0.42..228012.62 rows=208120 width=15) (actual
time=0.042..2089.367 rows=20 loops=1)
   Buffers: shared hit=53146
   -  Index Scan using t1_name on t1  (cost=0.42..21931.42 rows=80
width=15) (actual time=0.008..237.854 rows=80 loops=1)
 Buffers: shared hit=53146
 Total runtime: 2111.004 ms
(5 rows)


We cannot change our table structure to reflect something more sensible.
What we would really like to know is why using functions is so much slower
than the unreadable method.

Regards

Ross


Re: [PERFORM] Slow query

2014-09-23 Thread David G Johnston
Ross Elliott-2 wrote
 Maybe someone can explain this. The following SQL will reproduce our
 issue:
 DROP TABLE IF EXISTS t1 CASCADE;
 CREATE TABLE t1 (name text,
  state text);
 CREATE INDEX t1_name ON t1(name);
 CREATE INDEX t1_state ON t1(state);
 CREATE INDEX t1_name_state ON t1(name,state);
 
 -- Create some sample data
 DO $$
 DECLARE
 states text[] := array['UNKNOWN', 'TODO', 'DONE', 'UNKNOWN'];
 BEGIN
 FOR v IN 1..20 LOOP
   INSERT INTO t1 VALUES('user'||v, states[floor(random()*4)]);
   INSERT INTO t1 VALUES('user'||v, states[floor(random()*4)]);
   INSERT INTO t1 VALUES('user'||v, states[floor(random()*4)]);
   INSERT INTO t1 VALUES('user'||v, states[floor(random()*4)]);
 END LOOP;
 END $$;
 
 
 CREATE OR REPLACE FUNCTION state_to_int(state character varying) RETURNS
 integer
 LANGUAGE plpgsql IMMUTABLE STRICT
 AS $$BEGIN
 IF state = 'UNKNOWN' THEN RETURN 0;
 ELSIF state = 'TODO' THEN RETURN 1;
 ELSIF state = 'DONE' THEN RETURN 2;
 ELSIF state = 'NOT REQUIRED' THEN RETURN 3;
 ELSE RAISE EXCEPTION 'state_to_int called with invalid state value';
 END IF;
 END;$$;
 
 CREATE OR REPLACE FUNCTION int_to_state(state integer) RETURNS character
 varying
 LANGUAGE plpgsql IMMUTABLE STRICT
 AS $$BEGIN
 IF state = 0 THEN RETURN 'UNKNOWN';
 ELSIF state = 1 THEN RETURN 'TODO';
 ELSIF state = 2 THEN RETURN 'DONE';
 ELSIF state = 3 THEN RETURN 'NOT REQUIRED';
 ELSE RAISE EXCEPTION 'int_to_state called with invalid state value';
 END IF;
 END;$$;
 
 -- Why is this a lot slower
 explain (analyse, buffers) select name,
 int_to_state(min(state_to_int(state))) as status from t1 group by t1.name;
 
 -- Than this?
 explain (analyze, buffers) select name, (array['UNKNOWN', 'TODO', 'DONE',
 'NOT REQUIRED'])[min(
 CASE state
 WHEN 'UNKNOWN' THEN 0
 WHEN 'TODO' THEN 1
 WHEN 'DONE' THEN 2
 WHEN 'NOT REQUIRED' THEN 3
 END)] AS status from t1 group by t1.name;
 
 -- This is also very much slower
 explain (analyze, buffers) select name, (array['UNKNOWN', 'TODO', 'DONE',
 'NOT REQUIRED'])[min(state_to_int(state))] AS status from t1 group by
 t1.name;
 
 This was done on:
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
 4.8.2-19ubuntu1) 4.8.2, 64-bit
 
 We get results like this:
 QUERY PLAN
 
 ---
  GroupAggregate  (cost=0.42..280042.62 rows=208120 width=15) (actual
 time=0.076..2439.066 rows=20 loops=1)
Buffers: shared hit=53146
-  Index Scan using t1_name on t1  (cost=0.42..21931.42 rows=80
 width=15) (actual time=0.009..229.477 rows=80 loops=1)
  Buffers: shared hit=53146
  Total runtime: 2460.860 ms
 (5 rows)
 
 QUERY PLAN
 
 ---
  GroupAggregate  (cost=0.42..36012.62 rows=208120 width=15) (actual
 time=0.017..559.384 rows=20 loops=1)
Buffers: shared hit=53146
-  Index Scan using t1_name on t1  (cost=0.42..21931.42 rows=80
 width=15) (actual time=0.008..197.133 rows=80 loops=1)
  Buffers: shared hit=53146
  Total runtime: 574.550 ms
 (5 rows)
 
 QUERY PLAN
 
 ---
  GroupAggregate  (cost=0.42..228012.62 rows=208120 width=15) (actual
 time=0.042..2089.367 rows=20 loops=1)
Buffers: shared hit=53146
-  Index Scan using t1_name on t1  (cost=0.42..21931.42 rows=80
 width=15) (actual time=0.008..237.854 rows=80 loops=1)
  Buffers: shared hit=53146
  Total runtime: 2111.004 ms
 (5 rows)
 
 
 We cannot change our table structure to reflect something more sensible.
 What we would really like to know is why using functions is so much slower
 than the unreadable method.
 
 Regards
 
 Ross

Pl/pgsql functions are black boxes and expensive to execute; you should
define these functions as SQL functions and see if that helps.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Slow-query-tp5820086p5820096.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

2014-07-22 Thread Tom Lane
johno jan.suc...@gmail.com writes:
 On Tue, Jul 22, 2014 at 4:53 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 johno jan.suc...@gmail.com writes:
 The obvious query is
 SELECT * FROM register_uz_accounting_entities
 WHERE effective_on  '2014-07-11' OR (effective_on = '2014-07-11' AND
 id  1459)
 ORDER BY effective_on, id
 LIMIT 100

 A more readily optimizable query is
 SELECT * FROM register_uz_accounting_entities
 WHERE (effective_on, id)  ('2014-07-11'::date, 1459)
 ORDER BY effective_on, id
 LIMIT 100

 Yes, but that query has completely different semantics - I can't change
 that.

No, it doesn't.  Read it again ... or read up on row comparisons,
if you're unfamiliar with that notation.  The above queries are
exactly equivalent per spec.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

2014-07-22 Thread johno

 No, it doesn't.  Read it again ... or read up on row comparisons,
 if you're unfamiliar with that notation.  The above queries are
 exactly equivalent per spec.


Wow, this is great. Thanks.


[PERFORM] Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

2014-07-21 Thread johno
Hi there,

I am trying to optimize a simple query that returns first 100 rows that
have been updated since a given timestamp (ordered by timestamp and id
desc).  If there are several rows with the same timestamp I need to a
second condition, that states that I want to return rows having the given
timestamp and id  given id.

The obvious query is

SELECT * FROM register_uz_accounting_entities
 WHERE effective_on  '2014-07-11' OR (effective_on = '2014-07-11' AND
 id  1459)
 ORDER BY effective_on, id
 LIMIT 100


With a composite index on (effective_on, id)

Query plan

Limit  (cost=4613.70..4613.95 rows=100 width=1250) (actual
 time=0.122..0.130 rows=22 loops=1)
   Buffers: shared hit=28
   -  Sort  (cost=4613.70..4617.33 rows=1453 width=1250) (actual
 time=0.120..0.122 rows=22 loops=1)
 Sort Key: effective_on, id
 Sort Method: quicksort  Memory: 30kB
 Buffers: shared hit=28
 -  Bitmap Heap Scan on register_uz_accounting_entities
  (cost=35.42..4558.17 rows=1453 width=1250) (actual time=0.036..0.083
 rows=22 loops=1)
   Recheck Cond: ((effective_on  '2014-07-11'::date) OR
 ((effective_on = '2014-07-11'::date) AND (id  1459)))
   Buffers: shared hit=28
   -  BitmapOr  (cost=35.42..35.42 rows=1453 width=0) (actual
 time=0.026..0.026 rows=0 loops=1)
 Buffers: shared hit=6
 -  Bitmap Index Scan on idx2  (cost=0.00..6.49
 rows=275 width=0) (actual time=0.017..0.017 rows=15 loops=1)
   Index Cond: (effective_on  '2014-07-11'::date)
   Buffers: shared hit=3
 -  Bitmap Index Scan on idx2  (cost=0.00..28.21
 rows=1178 width=0) (actual time=0.007..0.007 rows=7 loops=1)
   Index Cond: ((effective_on =
 '2014-07-11'::date) AND (id  1459))
   Buffers: shared hit=3
 Total runtime: 0.204 ms



Everything works as expected. However if I change the constraint to a
timestamp/date more in the past (resulting in far more matching rows) the
query slows down drastically.


 SELECT * FROM register_uz_accounting_entities
 WHERE effective_on  '2014-06-11' OR (effective_on = '2014-06-11' AND id 
 1459)
 ORDER BY effective_on, id
 LIMIT 100

 Limit  (cost=0.42..649.46 rows=100 width=1250) (actual
 time=516.125..516.242 rows=100 loops=1)
   Buffers: shared hit=576201
   -  Index Scan using idx2 on register_uz_accounting_entities
  (cost=0.42..106006.95 rows=16333 width=1250) (actual time=516.122..516.229
 rows=100 loops=1)
 Filter: ((effective_on  '2014-06-11'::date) OR ((effective_on =
 '2014-06-11'::date) AND (id  1459)))
 Rows Removed by Filter: 797708
 Buffers: shared hit=576201
 Total runtime: 516.304 ms



I've tried to optimize this query by pushing down the limit and order by's
into explicit subselects.

SELECT * FROM (
SELECT * FROM register_uz_accounting_entities
WHERE effective_on  '2014-06-11'
ORDER BY effective_on, id LIMIT 100
) t1
 UNION
   SELECT * FROM (
 SELECT * FROM register_uz_accounting_entities
 WHERE effective_on = '2014-06-11' AND id  1459
 ORDER BY effective_on, id LIMIT 100
 ) t2
 ORDER BY effective_on, id
 LIMIT 100

 -- query plan
 Limit  (cost=684.29..684.54 rows=100 width=1250) (actual
 time=2.648..2.708 rows=100 loops=1)
   Buffers: shared hit=203
   -  Sort  (cost=684.29..684.79 rows=200 width=1250) (actual
 time=2.646..2.672 rows=100 loops=1)
 Sort Key: register_uz_accounting_entities.effective_on,
 register_uz_accounting_entities.id
 Sort Method: quicksort  Memory: 79kB
 Buffers: shared hit=203
 -  HashAggregate  (cost=674.65..676.65 rows=200 width=1250)
 (actual time=1.738..1.971 rows=200 loops=1)
   Buffers: shared hit=203
   -  Append  (cost=0.42..661.15 rows=200 width=1250) (actual
 time=0.054..0.601 rows=200 loops=1)
 Buffers: shared hit=203
 -  Limit  (cost=0.42..338.62 rows=100 width=1250)
 (actual time=0.053..0.293 rows=100 loops=1)
   Buffers: shared hit=101
   -  Index Scan using idx2 on
 register_uz_accounting_entities  (cost=0.42..22669.36 rows=6703 width=1250)
 (actual time=0.052..0.260 rows=100 loops=1)
 Index Cond: (effective_on 
 '2014-06-11'::date)
 Buffers: shared hit=101
 -  Limit  (cost=0.42..318.53 rows=100 width=1250)
 (actual time=0.037..0.228 rows=100 loops=1)
   Buffers: shared hit=102
   -  Index Scan using idx2 on
 register_uz_accounting_entities register_uz_accounting_entities_1
  (cost=0.42..30888.88 rows=9710 width=1250) (actual time=0.036..0.187
 rows=100 loops=1)
 Index Cond: ((effective_on =
 '2014-06-11'::date) AND (id  1459))
 Buffers: shared hit=102

Re: [PERFORM] Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

2014-07-21 Thread Tom Lane
johno jan.suc...@gmail.com writes:
 I am trying to optimize a simple query that returns first 100 rows that
 have been updated since a given timestamp (ordered by timestamp and id
 desc).  If there are several rows with the same timestamp I need to a
 second condition, that states that I want to return rows having the given
 timestamp and id  given id.

 The obvious query is

 SELECT * FROM register_uz_accounting_entities
 WHERE effective_on  '2014-07-11' OR (effective_on = '2014-07-11' AND
 id  1459)
 ORDER BY effective_on, id
 LIMIT 100

A more readily optimizable query is

SELECT * FROM register_uz_accounting_entities
WHERE (effective_on, id)  ('2014-07-11'::date, 1459)
ORDER BY effective_on, id
LIMIT 100

This formulation allows the planner to match both the WHERE and ORDER BY
clauses directly to the two-column index.

 I've tried to optimize this query by pushing down the limit and order by's
 into explicit subselects.

As noted earlier, that's unlikely to be an improvement, because on its
face it specifies more computation.  Postgres is not terribly bright
about UNIONs, either.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with indexed ORDER BY and LIMIT when using OR'd conditions

2014-07-21 Thread johno
On Tue, Jul 22, 2014 at 4:53 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 johno jan.suc...@gmail.com writes:
  I am trying to optimize a simple query that returns first 100 rows that
  have been updated since a given timestamp (ordered by timestamp and id
  desc).  If there are several rows with the same timestamp I need to a
  second condition, that states that I want to return rows having the given
  timestamp and id  given id.

  The obvious query is

  SELECT * FROM register_uz_accounting_entities
  WHERE effective_on  '2014-07-11' OR (effective_on = '2014-07-11' AND
  id  1459)
  ORDER BY effective_on, id
  LIMIT 100

 A more readily optimizable query is

 SELECT * FROM register_uz_accounting_entities
 WHERE (effective_on, id)  ('2014-07-11'::date, 1459)
 ORDER BY effective_on, id
 LIMIT 100


Yes, but that query has completely different semantics - I can't change
that.



 This formulation allows the planner to match both the WHERE and ORDER BY
 clauses directly to the two-column index.


Are both fields really used? I was under the impression that only the first
column from index can be used when there is a range query.



  I've tried to optimize this query by pushing down the limit and order
 by's
  into explicit subselects.

 As noted earlier, that's unlikely to be an improvement, because on its
 face it specifies more computation.  Postgres is not terribly bright
 about UNIONs, either.



Despite the cost calculation in explain the actual query times are very
different. I get consistent sub 50ms responses from the optimized one
(union with pushing down the limits) and 500+ms for the plain one (when not
using bitmap index scan).

Is this possible optimization considered by query planner or do I have
force it?

Thanks again for your time and effort, I appreciate it.




 regards, tom lane



Re: [PERFORM] Slow query

2014-03-06 Thread David Johnston
Bikram Kesari Naik wrote
 Hi,
 
 I have a view which joins  multiple tables to give me a result. It takes
 more than a  minute to give me the result on psql prompt when I select all
 the data from that view.
 The single CPU which is used to run this query is utilized 100%.Even if I
 fire a count(*) it takes 10 Sec. I wanted to know if there is anything we
 can do to speedup this query below 1 sec.

In all likelihood you need to index your foreign keys, and possibly other
fields, but as you haven't provided table and index definitions it is hard
to say for sure.

Idepeing on how many rows are hidden I'm not sure an unqualified query on
this view can run in 1/60th the time even with indexes present - the
sequential scans are efficient if the proportion of the tables being
returned is high.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Slow-query-tp5795077p5795079.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query

2014-03-06 Thread Bikram Kesari Naik
Hi David,

We have indexes on all the columns which are used in the where clause and these 
tables are linked by foreign key constraint.


Thanks,
Bikram

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of David Johnston
Sent: Friday, March 07, 2014 11:53 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query

Bikram Kesari Naik wrote
 Hi,

 I have a view which joins  multiple tables to give me a result. It
 takes more than a  minute to give me the result on psql prompt when I
 select all the data from that view.
 The single CPU which is used to run this query is utilized 100%.Even
 if I fire a count(*) it takes 10 Sec. I wanted to know if there is
 anything we can do to speedup this query below 1 sec.

In all likelihood you need to index your foreign keys, and possibly other 
fields, but as you haven't provided table and index definitions it is hard to 
say for sure.

Idepeing on how many rows are hidden I'm not sure an unqualified query on this 
view can run in 1/60th the time even with indexes present - the sequential 
scans are efficient if the proportion of the tables being returned is high.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Slow-query-tp5795077p5795079.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



DISCLAIMER: The information in this message is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this message by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, or distribution of the message, or any action or omission 
taken by you in reliance on it, is prohibited and may be unlawful. Please 
immediately contact the sender if you have received this message in error. 
Further, this e-mail may contain viruses and all reasonable precaution to 
minimize the risk arising there from is taken by OnMobile. OnMobile is not 
liable for any damage sustained by you as a result of any virus in this e-mail. 
All applicable virus checks should be carried out by you before opening this 
e-mail or any attachment thereto.
Thank you - OnMobile Global Limited.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query

2014-03-06 Thread David Johnston
Bikram Kesari Naik wrote
 Hi David,
 
 We have indexes on all the columns which are used in the where clause and
 these tables are linked by foreign key constraint.
 
 
 Thanks,
 Bikram
 
 -Original Message-
 From: 

 pgsql-performance-owner@

  [mailto:

 pgsql-performance-owner@

 ] On Behalf Of David Johnston
 Sent: Friday, March 07, 2014 11:53 AM
 To: 

 pgsql-performance@

 Subject: Re: [PERFORM] Slow query
 
 Bikram Kesari Naik wrote
 Hi,

 I have a view which joins  multiple tables to give me a result. It
 takes more than a  minute to give me the result on psql prompt when I
 select all the data from that view.
 The single CPU which is used to run this query is utilized 100%.Even
 if I fire a count(*) it takes 10 Sec. I wanted to know if there is
 anything we can do to speedup this query below 1 sec.
 
 In all likelihood you need to index your foreign keys, and possibly other
 fields, but as you haven't provided table and index definitions it is hard
 to say for sure.
 
 Idepeing on how many rows are hidden I'm not sure an unqualified query on
 this view can run in 1/60th the time even with indexes present - the
 sequential scans are efficient if the proportion of the tables being
 returned is high.
 
 David J.

Read these.

https://wiki.postgresql.org/wiki/Using_EXPLAIN
https://wiki.postgresql.org/wiki/SlowQueryQuestions

And note, while the FK thing is likely not relevant in this situation
defining a constraint does not cause an index to be created.  Depending on
your usage patterns defining those indexes can be helpful.

One last thought: not only are your row counts high but it seems like your
row sizes may also be large due to them containing binary content.  You
likely need to take a different approach to solving whatever unspecified
problem this query is intended to solve if you need sub-second performance.

That all said the main area of improvement for this is system memory
concerns so, as noted in the links above, play with that and see what
happens.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Slow-query-tp5795077p5795086.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow query on join with Date =

2014-01-29 Thread Jim Treinen
I have a performance problem using a dimensional model where the date is
specified in a DATE dimension, specifically when using 'WHERE DATE = 'Some
Date'

This query runs very fast when using an equality expression, eg. 'WHERE
DATE = '2014-01-01, and I'm wondering if there is a way to make it run
fast when using the greater than expression.

The dimension table is about 5k rows, and the Fact table is ~60M.

Thanks in advance for any advice.

JT.



The query :

select sid, count(*) from fact fact_data fact left outer join dim_date dim
on dim.date_id = fact.date_id where dim.date = '2014-1-25' group by sid
order by count desc limit 10;

FACT Table Definition:

   Table public.fact_data
Column |Type | Modifiers
---+-+---
 date_id | integer |
 date| timestamp without time zone |
 agent_id  | integer |
 instance_id   | integer |
 sid   | integer |
 Indexes:
fact_agent_id btree (agent_id)
fact_date_id btree (date_id) CLUSTER
fact_alarms_sid btree (sid)


 Table public.dim_date
   Column   |  Type   | Modifiers

+-+
 date_id| integer | not null default
nextval('dim_date_date_id_seq'::regclass)
 date   | date|
 year   | integer |
 month  | integer |
 month_name | text|
 day| integer |
 day_of_year| integer |
 weekday_name   | text|
 calendar_week  | integer |
 quarter| text|
 year_quarter   | text|
 year_month | text|
 year_calendar_week | text|
 weekend| text|
 week_start_date| date|
 week_end_date  | date|
 month_start_date   | date|
 month_end_date | date|
Indexes:
dim_date_date btree (date)
dim_date_date_id btree (date_id)

EXPLAIN Output:

explain (analyze, buffers)  select dim.date_id, fact.sid, count(1) from
fact_data fact left outer join dim_date dim on dim.date_id = fact.date_id
where dim.date_id = 5139 group by 1,2  order by 3 desc limit 10;

   QUERY PLAN

-
 Limit  (cost=9772000.55..9772000.58 rows=10 width=8) (actual
time=91064.421..91064.440 rows=10 loops=1)
   Buffers: shared hit=4042 read=1542501
   -  Sort  (cost=9772000.55..9787454.06 rows=6181404 width=8) (actual
time=91064.408..91064.414 rows=10 loops=1)
 Sort Key: (count(1))
 Sort Method: top-N heapsort  Memory: 25kB
 Buffers: shared hit=4042 read=1542501
 -  GroupAggregate  (cost=9150031.23..9638422.63 rows=6181404
width=8) (actual time=90892.625..91063.905 rows=617 loops=1)
   Buffers: shared hit=4042 read=1542501
   -  Sort  (cost=9150031.23..9256675.57 rows=42657736
width=8) (actual time=90877.129..90964.995 rows=124965 loops=1)
 Sort Key: dim.date_id, fact.sid
 Sort Method: quicksort  Memory: 8930kB
 Buffers: shared hit=4042 read=1542501
 -  Hash Join  (cost=682.34..3160739.50 rows=42657736
width=8) (actual time=45087.394..90761.624 rows=124965 loops=1)
   Hash Cond: (fact.date_id = dim.date_id)
   Buffers: shared hit=4042 read=1542501
   -  Seq Scan on fact_data fact
 (cost=0.00..2139866.40 rows=59361340 width=8) (actual
time=0.090..47001.500 rows=59360952 loops=1)
 Buffers: shared hit=3752 read=1542501
   -  Hash  (cost=518.29..518.29 rows=13124
width=4) (actual time=21.083..21.083 rows=13125 loops=1)
 Buckets: 2048  Batches: 1  Memory Usage:
462kB
 Buffers: shared hit=290
 -  Seq Scan on dim_date dim
 (cost=0.00..518.29 rows=13124 width=4) (actual time=0.494..10.918
rows=13125 loops=1)
   Filter: (date_id = 5139)
   Rows Removed by Filter: 5138
   Buffers: shared hit=290
 Total runtime: 91064.496 ms
(25 rows)


Re: [PERFORM] Slow query (wrong index used maybe)

2014-01-28 Thread Stelian Iancu


On Mon, Jan 27, 2014, at 11:43, Gavin Flower wrote:
 On 28/01/14 08:10, bobJobS wrote:
  My developers have had the same issue.
 
  Postgres 9.2.3 on Linux 5.6.
 
 The latest Linux kernel is 3.13 (https://www.kernel.org), so I assume 
 5.6 is a distribution version.
 
 So which distribution of Linux are you using?
 
 

I cannot reply for Bob, but we're on Debian 7. 

 Cheers,
 Gavin
 
 
 -- 
 Sent via pgsql-performance mailing list
 (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query (wrong index used maybe)

2014-01-28 Thread bobJobS
RHEL 5.10 kernel 2.6.18



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Slow-query-wrong-index-used-maybe-tp5788979p5789206.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   4   5   >