date:20070722

Re: [SQL] The nested view from hell - Restricting a subquerry

2007-07-22 Thread Nis Jørgensen

Bryce Nesbitt skrev:
> I've got a legacy app with a hefty performance problem.  The basic
> problem is stupid design. It takes 10-15 seconds of CPU time to look up
> an invoice.
> Basically it's trying to mash up extra columns on an otherwise simple
> query, and those extra columns are subtotals.  Simplified (this looks
> best in a fixed width font):
> 
> SELECT  max(order_view.order_id),max(order_view.invoice_id)
> ,sum(order_view.mileage)
> FROM(SELECT order_id,invoice_id, 0 as miles FROM eg_order
>  UNION   
>  SELECT order_id,0 ,  miles FROM eg_order_line)
>  order_view GROUP BY order_view.order_id;
> 
> A select by order_id is fast.  The problem is the application uses
> "select * from view where invoice_id=x", and the second part of the
> UNION returns all possible rows in the database.  These get filtered out
> later, but at considerable performance hit.

Just for the record, I believe your simplified example should look like
this (changed "max(order_id)" to "order_id" in outer select , changed
"miles" to "mileage"):

SELECT  order_id,
max(order_view.invoice_id),
sum(order_view.mileage)
FROM(SELECT order_id,invoice_id, 0 as mileage FROM eg_order
 UNION
 SELECT order_id, 0, mileage FROM eg_order_line)
 order_view GROUP BY order_view.order_id;


It is pretty clear that the problem comes from joining on the result of
an aggregate. PG apparently is not smart enough to recognize that the
result of a max must be one of the values of the column (meaning that it
can use an index)

It is not clear whether there is a FK relation between eg_order and
eg_order_line and what the PK of eg_order is. If there is a FK, you can
do something along the lines of

SELECT  order_id,
invoice_id
COALESCE(sum(mileage),0) as mileage
FROMeg_order LEFT JOIN eg_order_line USING order_id
GROUP BY order_id, invoice_id


If there can be more than one invoice_id per order_id, you might need to add

HAVING invoice_id = (SELECT max(invoice_id) FROM eg_order eo2 WHERE
eg_order.order_id = eo2.order_id)

or similar.

Hope this helps,

Nis


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [SQL] The nested view from hell - Restricting a subquerry

2007-07-22 Thread Tom Lane

=?UTF-8?B?TmlzIErDuHJnZW5zZW4=?= <[EMAIL PROTECTED]> writes:
> Bryce Nesbitt skrev:
>> I've got a legacy app with a hefty performance problem.

> It is not clear whether there is a FK relation between eg_order and
> eg_order_line and what the PK of eg_order is.

Or in English: can there really be more than one invoice_id per order_id
or vice versa?  AFAICS your only hope of making searches on invoice_id
be fast is if you can GROUP BY both order_id and invoice_id, and get rid
of both max() calls.

> PG apparently is not smart enough to recognize that the
> result of a max must be one of the values of the column (meaning that it
> can use an index)

That's because it can't.  As written, the query demands sums over groups
that *include* a specific invoice_id --- but each sum has to include
contributions from rows that could have another invoice_id.  So the
condition on invoice_id cannot be pushed down to the individual scans.
If, in fact, the correct answer could be had by fetching only rows with
the specified invoice_id, then you need to fix the view to make that
clear.

BTW, wouldn't UNION ALL be better than UNION here?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [SQL] The nested view from hell - Restricting a subquerry

2007-07-22 Thread Bryce Nesbitt

Great analysis Gregory & Tom... UNION ALL will make a difference.

   ---

Here invoices consist of orders, orders consist of order lines.  Thus,
each order_id corresponds to just one invoice_id.

One possibility is to add an invoice_id to the order_line.  That way the
optimizer need not push anything... the rows will get filtered out early.

Gregory Stark wrote:
> Two things are going wrong. 
>
> First, does an order_id belong to precisely one invoice_id? In which case
> instead of grouping just y order_id you need to group by invoice_id,order_id
> and remove the MAX() from around invoice_id. The optimizer can't push the
> invoice_id=? clause down inside the group by because normally to calculate
> max() it needs th entire set of records. It doesn't know there will be only
> one value.
>
> Secondly it seems to me each branch of the union generates distinct values.
> That is there can't be any duplicates or overlap. In which case you can change
> the UNION to a UNION ALL.
>
> There might be more problems but at first glance it looks like the optimizer
> would be able to push the invoice_id=? clause into the subqueries once those
> two changes are made which would throw away the subtotals and reduce to a
> simple index lookup on invoice_id.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[SQL] how to download linux 7.3 image

2007-07-22 Thread Jean-David Beyer

Mohd Ghalib Akhtar wrote:
> how to download linux 7.3 image file(means os) ?
> 
It is not clear to me what you are trying to do. If you wish to download the
.iso files for Red Hat Linux 7.3, it has nothing to do with this mailing list.
Furthermore, RHL 7.3 has been discontinued for several years and is no
longer supported. It is obsolete.

If you want a Red Hat looking product, you should consider running the
latest version of Fedora.

If you really want an Enterprise version of linux, the current one from Red
Hat is called Red Hat Enterprise Linux 5.

OTOH, if you want a version of postgreSQL that will run on RHL 7.3, that
might be a problem since the current versions of postgreSQL probably all
demand a much newer kernel (RHL 7.3 used a 2.2 kernel, IIRC) and associated
libraries.

-- 
  .~.  Jean-David Beyer  Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A Registered Machine   241939.
 /( )\ Shrewsbury, New Jerseyhttp://counter.li.org
 ^^-^^ 17:55:01 up 14 days, 12 min, 2 users, load average: 4.22, 4.17, 4.17

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [SQL] The nested view from hell - Restricting a subquerry

2007-07-22 Thread Nis Jørgensen

Tom Lane skrev:

>> PG apparently is not smart enough to recognize that the
>> result of a max must be one of the values of the column (meaning that it
>> can use an index)
> 
> That's because it can't.  As written, the query demands sums over groups
> that *include* a specific invoice_id --- but each sum has to include
> contributions from rows that could have another invoice_id.  So the
> condition on invoice_id cannot be pushed down to the individual scans.
> If, in fact, the correct answer could be had by fetching only rows with
> the specified invoice_id, then you need to fix the view to make that
> clear.

Well, the query can be satisfied by looking only at the rows with an
order_id matching the invoice_id given. The condition that this is the
largest invoice in the group then needs to be checked afterwards.

I certainly did not expect the query planner to be able to deduce this,
though.

Nis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [SQL] The nested view from hell - Restricting a subquerry

2007-07-22 Thread Gregory Stark

Nis Jørgensen <[EMAIL PROTECTED]> writes:

> Well, the query can be satisfied by looking only at the rows with an
> order_id matching the invoice_id given. The condition that this is the
> largest invoice in the group then needs to be checked afterwards.
>
> I certainly did not expect the query planner to be able to deduce this,
> though.

No, that's not true. If you had two records in eg_order with the same order_id
but different invoice_ids then the query would need both records to satisfy
the query.

The query planner can't deduce that this can't happen because it simply does
not have that information.

The more I look at this view the more I think it's just seriously broken.
Why is it grouping by order_id at all if, I suspect, there will only be one
record per order_id in eg_orders??

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [SQL] The nested view from hell - Restricting a subquerry

2007-07-22 Thread Nis Jørgensen

Gregory Stark skrev:
> Nis Jørgensen <[EMAIL PROTECTED]> writes:
> 
>> Well, the query can be satisfied by looking only at the rows with an
>> order_id matching the invoice_id given. The condition that this is the
>> largest invoice in the group then needs to be checked afterwards.
>>
>> I certainly did not expect the query planner to be able to deduce this,
>> though.
> 
> No, that's not true. If you had two records in eg_order with the same order_id
> but different invoice_ids then the query would need both records to satisfy
> the query.

I assume you mean "... then both records are necessary in order to
calculate the results of the query". This does not contradict what I wrote.

If you mean "... then both records need to satisfy " then
I don't understand which criteria you are talking about. The query in
question was:

SELECT  order_id,
max(order_view.invoice_id),
sum(order_view.mileage)
FROM(SELECT order_id,invoice_id, 0 as mileage FROM eg_order
 UNION
 SELECT order_id, 0, mileage FROM eg_order_line)
 order_view GROUP BY order_view.order_id;

This is then restricted on max(invoice_id)

As far as I can tell, these steps produce the correct results (without
the later information about primary keys provided by Bryce)

INPUT: my_invoice_id

1. Look up all order_ids for which (order_id,my_invoice_id) appear in
eg_orders

2. Find all rows (in both branches of the UNION) with these id_s

3. Group the rows, and calculate max(invoice_id)

4. Filter the result rows on max(invoice_id) = my_invoice_id.

> The query planner can't deduce that this can't happen because it simply does
> not have that information.

Well, I realize that the existing query planner can't. Since I could
arrive at the plan above by deduction so could a hypothetical different
query planner. Whether it is faster is of course unknown - my guess is
that it would be in this case.

> The more I look at this view the more I think it's just seriously broken.
> Why is it grouping by order_id at all if, I suspect, there will only be one
> record per order_id in eg_orders??

Bryce has confirmed this. The above is only of academic interest.

Yours,
Nis Jorgensen

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [SQL] The nested view from hell - Restricting a subquerry

Re: [SQL] The nested view from hell - Restricting a subquerry

Re: [SQL] The nested view from hell - Restricting a subquerry

[SQL] how to download linux 7.3 image

Re: [SQL] The nested view from hell - Restricting a subquerry

Re: [SQL] The nested view from hell - Restricting a subquerry

Re: [SQL] The nested view from hell - Restricting a subquerry

7 matches

Site Navigation

Mail list logo

Footer information