Re: [GENERAL] Optimize Query

2016-02-14 Thread Alban Hertroys

> On 14 Feb 2016, at 20:40, drum.lu...@gmail.com wrote:
> 
> Hi Alban! Sorry.. that was my mistake

Okay, first advice on that query: Trim it down to something that people can 
wrap their minds around.

You have a silly amount of code repetition in there, much of which doesn't even 
seem to serve a purpose. This is some kind of generated query, I gather?
For example, you don't use most of the fields from your first subquery.

Another example is the nested subquery in your left join, which can be reduced 
to a single subquery with just the fields summed that you actually need (and 
the customer_id, obviously).

The same goes for most of the joins inside that left join, definitely the left 
joins - but that depends a bit on your table definitions and contents.
For example, the fields you're summing come from account (but you can use 
customer instead, since you only use the account_id, which equals client_id 
anyway) and bill_item. Some fields in your where-clause come from job, some 
others I can't tell where they're from.

Trim, trim, trim, until you're left with a more readable query that gives you 
the same results and then put it through explain analyze again. It wouldn't 
surprise me if that query is already significantly faster.

If you're still having problems at that point, post that query and the analysis 
again.

> Explain analyze link:
> http://explain.depesz.com/s/5WJy

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Optimize Query

2016-02-14 Thread drum.lu...@gmail.com
Hi Alban! Sorry.. that was my mistake


Original Query:

SELECT concat(company, ' ', customer_name_first, ' ',
customer_name_last) AS customer,
   sum(revenue) AS revenue,
   sum(i.quantity) AS quantity,
   sum(i.cost) AS costFROM
  ( SELECT account.id,
   job.customerid,
   job.title,
   job.gps_lat,
   job.gps_long,
   status.label AS status,
   status.status_type_id,
   job.status_label_id,
   client."position",
   bill_item.quantity,
   client.businesstype,
   account.id AS clientid,
   client.name_first AS customer_name_first,
   client.name_last AS customer_name_last,
   job.id AS jobid,
   note.mobiuserid,
   bill_item.for_invoicing AS invoice,
   COALESCE(bill_item.unit_price, billable.unit_price, 0) AS unit_price,
   note.n_quote_status,
   COALESCE(bill_item.unit_cost, billable.unit_cost, 0) AS unit_cost,
   job.time_job,
   "user".name_first,
   "user".name_last,
   role.id AS roleid,
   role.name AS role_name,
   billable.billable_id AS taskid,
   COALESCE(labs.tag, billable.code) AS task_name,
   note.time_start,
   client.company,
   job.refnum,
   (COALESCE(bill_item.unit_cost, billable.unit_cost, 0) *
bill_item.quantity) AS cost,
   (COALESCE(bill_item.unit_price, billable.unit_price, 0) *
bill_item.quantity) AS revenue,
   bill_item.for_invoicing AS invoiceable,
   COALESCE(extract('epoch'
FROM bill.ts_creation AT TIME ZONE 'UTC'),
bill_item.invoice_id, NULL) IS NOT NULL AS invoiced
   FROM ja_clients AS account
   JOIN ja_customers AS client ON client.clientid = account.id
   JOIN ja_jobs AS job ON client.id=job.customerid
   JOIN ja_notes AS note ON note.jobid = job.id
   JOIN dm.bill_items AS bill_item ON bill_item.bill_item_id=note.bill_item_id
   LEFT JOIN dm.bills AS bill ON bill.bill_id=bill_item.bill_id
   LEFT JOIN dm.invoices AS invoice ON invoice.invoice_id=bill.invoice_id
   OR invoice.invoice_id=bill_item.invoice_id
   LEFT JOIN dm.billables AS billable ON billable.billable_id=note.billable_id
   LEFT JOIN dm.labors AS labs ON labs.billable_id = billable.billable_id
   JOIN ja_mobiusers AS "user" ON "user".id = note.mobiuserid
   JOIN ja_status AS status ON status.id = job.status_label_id
   JOIN ja_role AS ROLE ON ROLE.id="user".user_type
   WHERE note.note_type::text = ANY (ARRAY[ ('time'::CHARACTER VARYING)::text,
('part'::CHARACTER VARYING)::text ])
 AND NOT job.templated
 AND NOT job.deleted
 AND job.clientid = 6239
 AND time_job >= 1438351200
 AND time_job <= 144340
 AND bill_item.for_invoicing = TRUE) AS iLEFT JOIN
  (SELECT customerid,
  SUM(cost) AS cost,
  SUM(quantity) AS quantity
   FROM
 (SELECT account.id,
 job.customerid,
 job.title,
 job.gps_lat,
 job.gps_long,
 status.label AS status,
 status.status_type_id,
 job.status_label_id,
 client."position",
 bill_item.quantity,
 client.businesstype,
 account.id AS clientid,
 client.name_first AS customer_name_first,
 client.name_last AS customer_name_last,
 job.id AS jobid,
 note.mobiuserid,
 bill_item.for_invoicing AS invoice,
 COALESCE(bill_item.unit_price, billable.unit_price, 0) AS
unit_price,
 note.n_quote_status,
 COALESCE(bill_item.unit_cost, billable.unit_cost, 0) AS unit_cost,
 job.time_job,
 "user".name_first,
 "user".name_last,
 ROLE.id AS roleid,
  ROLE.name AS role_name,
   billable.billable_id AS taskid,
   COALESCE(labs.tag, billable.code) AS task_name,
   note.time_start,
   client.company,
   job.refnum,
   (COALESCE(bill_item.unit_cost,
billable.unit_cost, 0) * bill_item.quantity) AS cost,
   (COALESCE(bill_item.unit_price,
billable.unit_price, 0) * bill_item.quantity) AS revenue,
   bill_item.for_invoicing AS invoiceable,
   COALESCE(extract('epoch'
FROM bill.ts_creation AT TIME
ZONE 'UTC'), bill_item.invoice_id, NULL) IS NOT NULL AS invoiced
  FROM ja_clients AS account
  JOIN ja_customers AS client ON client.clientid = account.id
  JOIN ja_jobs AS job ON client.id=job.customerid
  JOIN ja_notes AS note ON note.jobid = job.id
  JOIN dm.bill_items AS bill_item ON
bill_item.bill_item_id=note.bill_item_id
  LEFT JOIN dm.bills AS bill ON bill.bill_id=bill_item.bill_id
  LEFT JOIN 

Re: [GENERAL] Optimize Query

2016-02-14 Thread Alban Hertroys

> On 13 Feb 2016, at 11:21, drum.lu...@gmail.com wrote:
> 
> Anyone can help with that please?
> 
> Cheers

What would help is:
1. to post an actual query that you need optimised and
2. an explain analyze of that query.

What you posted in your original message was some kind of query-template with 
enough placeholders and views that there is no way to predict how that's going 
to perform without at least knowing what goes into the placeholders and how the 
views are built up.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Optimize Query

2016-02-13 Thread drum.lu...@gmail.com
Anyone can help with that please?

Cheers

On Thursday, 11 February 2016, drum.lu...@gmail.com 
wrote:

> oh ok!
>
> thanks
>
>
>
> Lucas Possamai
>
> - kinghost.co.nz
> 
> - DigitalOcean 
>
> On 11 February 2016 at 15:41, Melvin Davidson  > wrote:
>
>>
>>
>>
>>> Thank you Melvin.
>>> Sorry but I was unable to see the *n_quote_status = 0*
>>>
>>> Did you use it?
>>>
>>>
>> No, I just revised you "new" query to be more efficient.
>>
>>
>> --
>> *Melvin Davidson*
>> I reserve the right to fantasize.  Whether or not you
>> wish to share my fantasy is entirely up to you.
>>
>
>

-- 


Lucas Possamai

- kinghost.co.nz

- DigitalOcean 


Re: [GENERAL] Optimize Query

2016-02-10 Thread John R Pierce

On 2/10/2016 6:38 PM, drum.lu...@gmail.com wrote:

Sorry but I was unable to see the *n_quote_status = 0*


I'm unable to see this variable anywhere in your two original queries, 
the SQL one, and the other ?? abbreviated thing, nor did you give any 
table definitions, so I'm not even sure what you mean by n_quote_status





--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] Optimize Query

2016-02-10 Thread drum.lu...@gmail.com
>
>
> FYI, it is always helpful (and polite) to state the PostgreSQL VERSION and
> O/S for archive documentation purposes!
>

That's right. My mistake... I'm using PostgreSQL 9.2.


>
> Note that various postgresql.conf options, system memory & hardware also
> play a factor here, in addition to current table statistics.
>
> That being said, try the following:
>
> WITH jobs AS
> (
>  SELECT id,
> customerid,
> status_label_id
>   FROM ja_jobs
>  WHERE NOT templated
>AND NOT deleted
>AND clientid = 6239
>AND time_job >= 1438351200
>AND time_job <= 144340
> )
> SELECT concat(client.company,
>' ',
>client.name_first,
>' ', client.name_last) AS customer,
>sum(COALESCE(bill_item.unit_price, billable.unit_price, 0) *
> bill_item.quantity) AS revenue,
>sum(bill_item.quantity) AS quantity,
>sum(COALESCE(bill_item.unit_cost, billable.unit_cost, 0) *
> bill_item.quantity) AS cost
>   FROM ja_clientsAS account
>   JOIN ja_customers  AS client ON client.clientid = account.id
>   JOIN jobs  AS jobON job.customerid = client.id
>   JOIN ja_notes  AS note   ON note.jobid = job.id
>   JOIN dm.bill_items AS bill_item ON
> bill_item.bill_item_id=note.bill_item_id
>   LEFT JOIN dm.bills AS bill ON bill.bill_id=bill_item.bill_id
>   LEFT JOIN dm.invoices AS invoice ON invoice.invoice_id=bill.invoice_id
>OR
> invoice.invoice_id=bill_item.invoice_id
>   LEFT JOIN dm.billables AS billable ON
> billable.billable_id=note.billable_id
>   LEFT JOIN dm.labors AS labs ON labs.billable_id = billable.billable_id
>   JOIN ja_mobiusers AS "user" ON "user".id = note.mobiuserid
>   JOIN ja_status AS status ON status.id = job.status_label_id
>   JOIN ja_role AS ROLE ON ROLE.id="user".user_type
>  WHERE note.note_type::text = ANY (ARRAY[ ('time'::CHARACTER
> VARYING)::text, ('part'::CHARACTER VARYING)::text ])
>AND bill_item.for_invoicing = TRUE
>  GROUP BY customer
>  ORDER BY revenue DESC;
>

Thank you Melvin.
Sorry but I was unable to see the *n_quote_status = 0*

Did you use it?


Re: [GENERAL] Optimize Query

2016-02-10 Thread Melvin Davidson
On Wed, Feb 10, 2016 at 8:25 PM, drum.lu...@gmail.com 
wrote:

> Hi all,
>
> I've got a slow query and I'm trying to make it faster.
>
> *New Query:*
>
> SELECT concat(client.company, ' ', client.name_first, ' ',
>> client.name_last) AS customer,
>>sum(COALESCE(bill_item.unit_price, billable.unit_price, 0) *
>> bill_item.quantity) AS revenue,
>>sum(bill_item.quantity) AS quantity,
>>sum(COALESCE(bill_item.unit_cost, billable.unit_cost, 0) *
>> bill_item.quantity) AS cost
>>FROM ja_clients AS account
>>JOIN ja_customers AS client ON client.clientid = account.id
>>JOIN ja_jobs AS job ON client.id=job.customerid
>>JOIN ja_notes AS note ON note.jobid = job.id
>>JOIN dm.bill_items AS bill_item ON
>> bill_item.bill_item_id=note.bill_item_id
>>LEFT JOIN dm.bills AS bill ON bill.bill_id=bill_item.bill_id
>>LEFT JOIN dm.invoices AS invoice ON invoice.invoice_id=bill.invoice_id
>>  OR invoice.invoice_id=bill_item.invoice_id
>>LEFT JOIN dm.billables AS billable ON
>> billable.billable_id=note.billable_id
>>LEFT JOIN dm.labors AS labs ON labs.billable_id = billable.billable_id
>>JOIN ja_mobiusers AS "user" ON "user".id = note.mobiuserid
>>JOIN ja_status AS status ON status.id = job.status_label_id
>>JOIN ja_role AS ROLE ON ROLE.id="user".user_type
>>WHERE note.note_type::text = ANY (ARRAY[ ('time'::CHARACTER
>> VARYING)::text,
>> ('part'::CHARACTER
>> VARYING)::text ])
>>  AND NOT job.templated
>>  AND NOT job.deleted
>>  AND job.clientid = 6239
>>  AND job.time_job >= 1438351200
>>  AND job.time_job <= 144340
>>  AND bill_item.for_invoicing = TRUE
>> GROUP BY customer
>> ORDER BY revenue DESC;
>
>
> *The original query has:*
>
> SELECT $cols $ec , sum(revenue) as revenue, $cost_cols
>> FROM (".note_detail_report_view(). ") AS i
>> LEFT JOIN (
>> SELECT $join_col , SUM(cost) AS cost, SUM(quantity) AS quantity
>> FROM (".note_detail_report_view(). ") AS note_detail_report_view
>> $whereClause AND *n_quote_status = 0*
>> GROUP BY $join_col
>> ) AS a
>> ON $joiner
>> $whereClause AND invoice = true $limit_inv
>> GROUP BY $group_by $ec, a.cost , a.quantity
>> ORDER BY $order_by
>
>
> I just need the a-case. i and a look very similar, except A with an
> additional filter: *n_quote_status = 0*
>
> How can I re-write that using the A case?
>
> Thanks
>

FYI, it is always helpful (and polite) to state the PostgreSQL VERSION and
O/S for archive documentation purposes!

Note that various postgresql.conf options, system memory & hardware also
play a factor here, in addition to current table statistics.

That being said, try the following:

WITH jobs AS
(
 SELECT id,
customerid,
status_label_id
  FROM ja_jobs
 WHERE NOT templated
   AND NOT deleted
   AND clientid = 6239
   AND time_job >= 1438351200
   AND time_job <= 144340
)
SELECT concat(client.company,
   ' ',
   client.name_first,
   ' ', client.name_last) AS customer,
   sum(COALESCE(bill_item.unit_price, billable.unit_price, 0) *
bill_item.quantity) AS revenue,
   sum(bill_item.quantity) AS quantity,
   sum(COALESCE(bill_item.unit_cost, billable.unit_cost, 0) *
bill_item.quantity) AS cost
  FROM ja_clientsAS account
  JOIN ja_customers  AS client ON client.clientid = account.id
  JOIN jobs  AS jobON job.customerid = client.id
  JOIN ja_notes  AS note   ON note.jobid = job.id
  JOIN dm.bill_items AS bill_item ON
bill_item.bill_item_id=note.bill_item_id
  LEFT JOIN dm.bills AS bill ON bill.bill_id=bill_item.bill_id
  LEFT JOIN dm.invoices AS invoice ON invoice.invoice_id=bill.invoice_id
   OR
invoice.invoice_id=bill_item.invoice_id
  LEFT JOIN dm.billables AS billable ON
billable.billable_id=note.billable_id
  LEFT JOIN dm.labors AS labs ON labs.billable_id = billable.billable_id
  JOIN ja_mobiusers AS "user" ON "user".id = note.mobiuserid
  JOIN ja_status AS status ON status.id = job.status_label_id
  JOIN ja_role AS ROLE ON ROLE.id="user".user_type
 WHERE note.note_type::text = ANY (ARRAY[ ('time'::CHARACTER
VARYING)::text, ('part'::CHARACTER VARYING)::text ])
   AND bill_item.for_invoicing = TRUE
 GROUP BY customer
 ORDER BY revenue DESC;

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Optimize query for listing un-read messages

2014-05-03 Thread Alban Hertroys
On 01 May 2014, at 13:06, Andreas Joseph Krogh andr...@visena.com wrote:

 I have the following query to list all un-read messages for person with id=1:
  
 SELECT
 m.id  AS message_id,
 prop.person_id,
 coalesce(prop.is_read, FALSE) AS is_read,
 m.subject
 FROM message m
 LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND 
 prop.person_id = 1
 WHERE 1 = 1
   AND NOT EXISTS(SELECT
  *
  FROM message_property pr
  WHERE pr.message_id = m.id AND pr.person_id = 
 prop.person_id AND prop.is_read = TRUE)
 ;

Since most messages will have prop.is_read = TRUE, that part of the query 
suffers from low selectivity. Querying for the opposite is probably much 
faster, which you may even be able to speed up more with a partial index on 
is_read = FALSE.

 Does anyone have suggestions on how to optimize the query or schema? It's 
 important that any message not having an entry in message_property for a user 
 is considered un-read.

Do you really need to query message_property twice? I would think this would 
give the same results:

SELECT
m.id  AS message_id,
prop.person_id,
coalesce(prop.is_read, FALSE) AS is_read,
m.subject
FROM message m
LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND 
prop.person_id = 1 AND prop.is_read = FALSE
;


Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Optimize query for listing un-read messages

2014-05-03 Thread Andreas Joseph Krogh
På lørdag 03. mai 2014 kl. 11:51:08, skrev Alban Hertroys haram...@gmail.com 
mailto:haram...@gmail.com: On 01 May 2014, at 13:06, Andreas Joseph Krogh 
andr...@visena.com wrote:

  I have the following query to list all un-read messages for person with 
id=1:
  
  SELECT
      m.id                          AS message_id,
      prop.person_id,
      coalesce(prop.is_read, FALSE) AS is_read,
      m.subject
  FROM message m
      LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND 
prop.person_id = 1
  WHERE 1 = 1
        AND NOT EXISTS(SELECT
                           *
                       FROM message_property pr
                       WHERE pr.message_id = m.id AND pr.person_id = 
prop.person_id AND prop.is_read = TRUE)
      ;

 Since most messages will have prop.is_read = TRUE, that part of the query 
suffers from low selectivity. Querying for the opposite is probably much 
faster, which you may even be able to speed up more with a partial index on 
is_read = FALSE.

  Does anyone have suggestions on how to optimize the query or schema? It's 
important that any message not having an entry in message_property for a user 
is considered un-read.

 Do you really need to query message_property twice? I would think this would 
give the same results:

 SELECT
     m.id                          AS message_id,
     prop.person_id,
     coalesce(prop.is_read, FALSE) AS is_read,
     m.subject
 FROM message m
     LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND 
prop.person_id = 1 AND prop.is_read = FALSE
 ;   That query doesn't produce the same reesult.   -- Andreas Jospeh Krogh 
CTO / Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com 
mailto:andr...@visena.com www.visena.com https://www.visena.com  
https://www.visena.com  

Re: [GENERAL] Optimize query for listing un-read messages

2014-05-03 Thread Alban Hertroys

On 03 May 2014, at 12:45, Andreas Joseph Krogh andr...@visena.com wrote:

 Do you really need to query message_property twice? I would think this would 
 give the same results:
 
 SELECT
 m.id  AS message_id,
 prop.person_id,
 coalesce(prop.is_read, FALSE) AS is_read,
 m.subject
 FROM message m
 LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND 
 prop.person_id = 1 AND prop.is_read = FALSE
 ;

Ah yes, of course that would match a bit too much. This however does give the 
same results:

SELECT
   m.id  AS message_id,
   prop.person_id,
   coalesce(prop.is_read, FALSE) AS is_read,
   m.subject
FROM message m
   LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND 
prop.person_id = 1
WHERE prop.is_read IS NULL OR prop.is_read = FALSE
;

That shaves off half the time of the query here, namely one indexscan.

The remaining time appears to be spent finding the rows in “message that do 
not have a corresponding “message_property for the given (message_id, 
person_id) tuple. It’s basically trying to find no needle in a haystack, you 
won’t know that there is no needle until you’ve searched the entire haystack.

It does seem to help a bit to create separate indexes on 
message_property.message_id and  message_property.person_id; that reduces the 
sizes of the indexes that the database needs to match and merge other in order 
to find the missing message_id’s.


Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Optimize query for listing un-read messages

2014-05-03 Thread Andreas Joseph Krogh
På lørdag 03. mai 2014 kl. 23:21:21, skrev Alban Hertroys haram...@gmail.com 
mailto:haram...@gmail.com: 
 On 03 May 2014, at 12:45, Andreas Joseph Krogh andr...@visena.com wrote:

  Do you really need to query message_property twice? I would think this 
would give the same results:
 
  SELECT
      m.id                          AS message_id,
      prop.person_id,
      coalesce(prop.is_read, FALSE) AS is_read,
      m.subject
  FROM message m
      LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND 
prop.person_id = 1 AND prop.is_read = FALSE
  ;

 Ah yes, of course that would match a bit too much. This however does give the 
same results:

 SELECT
    m.id                          AS message_id,
    prop.person_id,
    coalesce(prop.is_read, FALSE) AS is_read,
    m.subject
 FROM message m
    LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND 
prop.person_id = 1
 WHERE prop.is_read IS NULL OR prop.is_read = FALSE
 ;

 That shaves off half the time of the query here, namely one indexscan.

 The remaining time appears to be spent finding the rows in “message that do 
not have a corresponding “message_property for the given (message_id, 
person_id) tuple. It’s basically trying to find no needle in a haystack, you 
won’t know that there is no needle until you’ve searched the entire haystack.

 It does seem to help a bit to create separate indexes on 
message_property.message_id and  message_property.person_id; that reduces the 
sizes of the indexes that the database needs to match and merge other in order 
to find the missing message_id’s.   I think the consesus here is to create a 
caching-table, there's no way around it as PG is unable to index the difference 
between two sets.   -- Andreas Jospeh Krogh CTO / Partner - Visena AS Mobile: 
+47 909 56 963 andr...@visena.com mailto:andr...@visena.com www.visena.com 
https://www.visena.com  https://www.visena.com  

Re: [GENERAL] optimize query?

2013-01-30 Thread hamann . w

Bob Futrelle wrote:

If looking for the variants with a single suffixed character is all you'll
ever need to do:
Do a single pass on the large table, after creating a field, 'trunc' that
contains a truncated version of the item, e.g., adding XY423 to trunc for
each entry of the form XY423A, or XY423B, etc.  This would be a one-time
operation.

Hi Bob,

unfortunately it is not because  the suffix pattern is not always the same.
On the other hand, I tried to split the query into

select substring (code from '^[A-Z]+[0-9]+') as code into tmp table t
select  where tab2.code = t.code

which definitely was a big improvement. I believe that most of the time
passing in a pattern like that would work

So thank you for that tip

Regards
Wolfgang


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Optimize query: time of single * IN(many) time of many *

2004-01-09 Thread Paul Janssen
Tom Lane writes:
Paul Janssen writes:
Can anyone help me out with the following situation:
   (a) a single query with 550 id's in the IN-clause resulting into 800+ 
seconds;
   (b) 550 queries with a single id in the IN-clause resulting into 
overall time of 60 seconds;
The table consists of 950.000 records, and the resultset consists of 
205.000 records.

Why is there such an extreme difference in time?
Most likely the planner is opting not to use an indexscan in the first
case.  Could we see EXPLAIN ANALYZE results for both cases?  Also, try
SET enable_seqscan TO OFF and then repeat EXPLAIN ANALYZE for case (a).
			regards, tom lane
Thanks all for your suggestions!

As suggested, I run EXPLAIN ANALYZE for the initial situation and the 
situation
that you suggested (seqscan OFF). The results of this change is already a
decrease of total runtime by 78%. Very nice!
In respect to the total runtime of the many * IN(single) there was a 
differ-
ence of 54x, that's now down to 11x. But still 10 times slower... Hope you 
can
use the details below to help me close the gap. Thx!

__ SITUATION 0 : INITIAL__
EXPLAIN ANALYZE single * IN(many)
Unique  (cost=2120494.74..2139985.87 rows=129941 width=24) (actual 
time=818313.20..820605.09 rows=335311 loops=1)
 -  Sort  (cost=2120494.74..2123743.26 rows=1299409 width=24) (actual 
time=818313.19..819327.09 rows=335311 loops=1)
   Sort Key: a_id, questions_id, answer, order_id, uid
   -  Append  (cost=0.00..1916403.49 rows=1299409 width=24) (actual 
time=421673.91..795948.40 rows=335311 loops=1)
 -  Subquery Scan *SELECT* 1  (cost=0.00..1344006.15 
rows=911298 width=24) (actual time=421673.91..556486.08 rows=207809 loops=1)
   -  Seq Scan on tbl_scores  (cost=0.00..1344006.15 
rows=911298 width=24) (actual time=421673.88..554797.94 rows=207809 loops=1)
 Filter: ((a_id = 1) OR (a_id = 2) OR ... OR (a_id 
= 550))
 -  Subquery Scan *SELECT* 2  (cost=0.00..572397.34 
rows=388111 width=24) (actual time=157088.99..238498.40 rows=127502 loops=1)
   -  Seq Scan on tbl_scores_extra  (cost=0.00..572397.34 
rows=388111 width=24) (actual time=157088.96..237474.23 rows=127502 loops=1)
 Filter: ((a_id = 1) OR (a_id = 2) OR ... OR (a_id 
= 550))
Total runtime: 821062.84 msec

EXPLAIN ANALYZE many * IN(single)
Unique  (cost=24501.02..24605.30 rows=695 width=24) (actual 
time=51.20..57.93 rows=1349 loops=1)
 -  Sort  (cost=24501.02..24518.40 rows=6952 width=24) (actual 
time=51.20..52.95 rows=1349 loops=1)
   Sort Key: a_id, questions_id, answer, order_id, uid
   -  Append  (cost=0.00..24057.38 rows=6952 width=24) (actual 
time=0.62..38.04 rows=1349 loops=1)
 -  Subquery Scan *SELECT* 1  (cost=0.00..17442.04 rows=4875 
width=24) (actual time=0.61..22.39 rows=931 loops=1)
   -  Index Scan using tbl_scores_idx_a on tbl_scores  
(cost=0.00..17442.04 rows=4875 width=24) (actual time=0.60..16.46 rows=931 
loops=1)
 Index Cond: (a_id = 1233)
 -  Subquery Scan *SELECT* 2  (cost=0.00..6615.34 rows=2076 
width=24) (actual time=0.67..12.10 rows=418 loops=1)
   -  Index Scan using tbl_scores_extra_idx_a on 
tbl_scores  (cost=0.00..6615.34 rows=2076 width=24) (actual time=0.65..9.42 
rows=418 loops=1)
 Index Cond: (a_id = 1233)
Total runtime: 60.59 msec

The subsequent 550 executions of the query lead to...
 * total runtime varies from 0,93..163.62 msec;
 * total runtime sums up to 15107msec.
__SITUATION 1 : SET enable_seqscan TO OFF__
EXPLAIN ANALYZE single * IN(many)
Unique  (cost=18425133.86..18444624.99 rows=129941 width=24) (actual 
time=174020.84..176301.67 rows=335311 loops=1)
 -  Sort  (cost=18425133.86..18428382.38 rows=1299409 width=24) (actual 
time=174020.82..175090.09 rows=335311 loops=1)
   Sort Key: a_id, questions_id, answer, order_id, uid
   -  Append  (cost=0.00..18221042.61 rows=1299409 width=24) (actual 
time=191.87..159763.68 rows=335311 loops=1)
 -  Subquery Scan *SELECT* 1  (cost=0.00..13088911.15 
rows=911298 width=24) (actual time=191.86..97162.20 rows=207809 loops=1)
   -  Index Scan using tbl_scores_idx_a, .(548x).. , 
tbl_scores_idx_a on tbl_scores (cost=0.00..13088911.15 rows=911298 width=24) 
(actual time=191.84..95416.34 rows=207809 loops=1)
 Index Cond: ((a_id = 1) OR (a_id = 2) OR ... OR 
(a_id = 550))
 -  Subquery Scan *SELECT* 2  (cost=0.00..5132131.46 
rows=388111 width=24) (actual time=59.60..61179.24 rows=127502 loops=1)
   -  Index Scan using tbl_scores_extra_idx_a, .(548x).. , 
tbl_scores_extra_idx_a on tbl_scores_extra (cost=0.00..5132131.46 
rows=388111 width=24) (actual time=59.58..59871.50 rows=127502 loops=1)
 Index Cond: ((a_id = 1) OR (a_id = 2) OR ... OR 
(a_id = 550))
Total runtime: 176785.19 msec


Re: [GENERAL] Optimize query: time of single * IN(many) time

2004-01-08 Thread Alvaro Herrera
On Thu, Jan 08, 2004 at 06:28:14AM -0500, Dave Smith wrote:
 Firstly you should always provide an explain from your query before
 posting to this list.

You mean while posting, because he can't possible provide the explain
before having the means to do so, can he? :-)

 I think the problem is that in 7.4 PG did not use
 indexes for IN queries. This has been fixed in 7.4.

I think what was done is to optimize queries like IN (SELECT ...) but
there has not been improvement for IN (1,2,3, ... 550) like he
appears to be doing.

Maybe something to try is putting the IDs in a (temp?) table and using
the first form.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Cada quien es cada cual y baja las escaleras como quiere (JMSerrat)

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Optimize query: time of single * IN(many) time of many * IN(single)

2004-01-08 Thread Tom Lane
Paul Janssen [EMAIL PROTECTED] writes:
 Can anyone help me out with the following situation:
(a) a single query with 550 id's in the IN-clause resulting into 800+ 
 seconds;
(b) 550 queries with a single id in the IN-clause resulting into overall 
 time of 60 seconds;
 The table consists of 950.000 records, and the resultset consists of 205.000 
 records.

 Why is there such an extreme difference in time?

Most likely the planner is opting not to use an indexscan in the first
case.  Could we see EXPLAIN ANALYZE results for both cases?  Also, try
SET enable_seqscan TO OFF and then repeat EXPLAIN ANALYZE for case (a).

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org