Re: [SQL] The nested view from hell - Restricting a subquerry
Bryce Nesbitt skrev: > I've got a legacy app with a hefty performance problem. The basic > problem is stupid design. It takes 10-15 seconds of CPU time to look up > an invoice. > Basically it's trying to mash up extra columns on an otherwise simple > query, and those extra columns are subtotals. Simplified (this looks > best in a fixed width font): > > SELECT max(order_view.order_id),max(order_view.invoice_id) > ,sum(order_view.mileage) > FROM(SELECT order_id,invoice_id, 0 as miles FROM eg_order > UNION > SELECT order_id,0 , miles FROM eg_order_line) > order_view GROUP BY order_view.order_id; > > A select by order_id is fast. The problem is the application uses > "select * from view where invoice_id=x", and the second part of the > UNION returns all possible rows in the database. These get filtered out > later, but at considerable performance hit. Just for the record, I believe your simplified example should look like this (changed "max(order_id)" to "order_id" in outer select , changed "miles" to "mileage"): SELECT order_id, max(order_view.invoice_id), sum(order_view.mileage) FROM(SELECT order_id,invoice_id, 0 as mileage FROM eg_order UNION SELECT order_id, 0, mileage FROM eg_order_line) order_view GROUP BY order_view.order_id; It is pretty clear that the problem comes from joining on the result of an aggregate. PG apparently is not smart enough to recognize that the result of a max must be one of the values of the column (meaning that it can use an index) It is not clear whether there is a FK relation between eg_order and eg_order_line and what the PK of eg_order is. If there is a FK, you can do something along the lines of SELECT order_id, invoice_id COALESCE(sum(mileage),0) as mileage FROMeg_order LEFT JOIN eg_order_line USING order_id GROUP BY order_id, invoice_id If there can be more than one invoice_id per order_id, you might need to add HAVING invoice_id = (SELECT max(invoice_id) FROM eg_order eo2 WHERE eg_order.order_id = eo2.order_id) or similar. Hope this helps, Nis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] The nested view from hell - Restricting a subquerry
=?UTF-8?B?TmlzIErDuHJnZW5zZW4=?= <[EMAIL PROTECTED]> writes: > Bryce Nesbitt skrev: >> I've got a legacy app with a hefty performance problem. > It is not clear whether there is a FK relation between eg_order and > eg_order_line and what the PK of eg_order is. Or in English: can there really be more than one invoice_id per order_id or vice versa? AFAICS your only hope of making searches on invoice_id be fast is if you can GROUP BY both order_id and invoice_id, and get rid of both max() calls. > PG apparently is not smart enough to recognize that the > result of a max must be one of the values of the column (meaning that it > can use an index) That's because it can't. As written, the query demands sums over groups that *include* a specific invoice_id --- but each sum has to include contributions from rows that could have another invoice_id. So the condition on invoice_id cannot be pushed down to the individual scans. If, in fact, the correct answer could be had by fetching only rows with the specified invoice_id, then you need to fix the view to make that clear. BTW, wouldn't UNION ALL be better than UNION here? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] The nested view from hell - Restricting a subquerry
Great analysis Gregory & Tom... UNION ALL will make a difference. --- Here invoices consist of orders, orders consist of order lines. Thus, each order_id corresponds to just one invoice_id. One possibility is to add an invoice_id to the order_line. That way the optimizer need not push anything... the rows will get filtered out early. Gregory Stark wrote: > Two things are going wrong. > > First, does an order_id belong to precisely one invoice_id? In which case > instead of grouping just y order_id you need to group by invoice_id,order_id > and remove the MAX() from around invoice_id. The optimizer can't push the > invoice_id=? clause down inside the group by because normally to calculate > max() it needs th entire set of records. It doesn't know there will be only > one value. > > Secondly it seems to me each branch of the union generates distinct values. > That is there can't be any duplicates or overlap. In which case you can change > the UNION to a UNION ALL. > > There might be more problems but at first glance it looks like the optimizer > would be able to push the invoice_id=? clause into the subqueries once those > two changes are made which would throw away the subtotals and reduce to a > simple index lookup on invoice_id. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[SQL] how to download linux 7.3 image
Mohd Ghalib Akhtar wrote: > how to download linux 7.3 image file(means os) ? > It is not clear to me what you are trying to do. If you wish to download the .iso files for Red Hat Linux 7.3, it has nothing to do with this mailing list. Furthermore, RHL 7.3 has been discontinued for several years and is no longer supported. It is obsolete. If you want a Red Hat looking product, you should consider running the latest version of Fedora. If you really want an Enterprise version of linux, the current one from Red Hat is called Red Hat Enterprise Linux 5. OTOH, if you want a version of postgreSQL that will run on RHL 7.3, that might be a problem since the current versions of postgreSQL probably all demand a much newer kernel (RHL 7.3 used a 2.2 kernel, IIRC) and associated libraries. -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jerseyhttp://counter.li.org ^^-^^ 17:55:01 up 14 days, 12 min, 2 users, load average: 4.22, 4.17, 4.17 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] The nested view from hell - Restricting a subquerry
Tom Lane skrev: >> PG apparently is not smart enough to recognize that the >> result of a max must be one of the values of the column (meaning that it >> can use an index) > > That's because it can't. As written, the query demands sums over groups > that *include* a specific invoice_id --- but each sum has to include > contributions from rows that could have another invoice_id. So the > condition on invoice_id cannot be pushed down to the individual scans. > If, in fact, the correct answer could be had by fetching only rows with > the specified invoice_id, then you need to fix the view to make that > clear. Well, the query can be satisfied by looking only at the rows with an order_id matching the invoice_id given. The condition that this is the largest invoice in the group then needs to be checked afterwards. I certainly did not expect the query planner to be able to deduce this, though. Nis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [SQL] The nested view from hell - Restricting a subquerry
Nis Jørgensen <[EMAIL PROTECTED]> writes: > Well, the query can be satisfied by looking only at the rows with an > order_id matching the invoice_id given. The condition that this is the > largest invoice in the group then needs to be checked afterwards. > > I certainly did not expect the query planner to be able to deduce this, > though. No, that's not true. If you had two records in eg_order with the same order_id but different invoice_ids then the query would need both records to satisfy the query. The query planner can't deduce that this can't happen because it simply does not have that information. The more I look at this view the more I think it's just seriously broken. Why is it grouping by order_id at all if, I suspect, there will only be one record per order_id in eg_orders?? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [SQL] The nested view from hell - Restricting a subquerry
Gregory Stark skrev: > Nis Jørgensen <[EMAIL PROTECTED]> writes: > >> Well, the query can be satisfied by looking only at the rows with an >> order_id matching the invoice_id given. The condition that this is the >> largest invoice in the group then needs to be checked afterwards. >> >> I certainly did not expect the query planner to be able to deduce this, >> though. > > No, that's not true. If you had two records in eg_order with the same order_id > but different invoice_ids then the query would need both records to satisfy > the query. I assume you mean "... then both records are necessary in order to calculate the results of the query". This does not contradict what I wrote. If you mean "... then both records need to satisfy " then I don't understand which criteria you are talking about. The query in question was: SELECT order_id, max(order_view.invoice_id), sum(order_view.mileage) FROM(SELECT order_id,invoice_id, 0 as mileage FROM eg_order UNION SELECT order_id, 0, mileage FROM eg_order_line) order_view GROUP BY order_view.order_id; This is then restricted on max(invoice_id) As far as I can tell, these steps produce the correct results (without the later information about primary keys provided by Bryce) INPUT: my_invoice_id 1. Look up all order_ids for which (order_id,my_invoice_id) appear in eg_orders 2. Find all rows (in both branches of the UNION) with these id_s 3. Group the rows, and calculate max(invoice_id) 4. Filter the result rows on max(invoice_id) = my_invoice_id. > The query planner can't deduce that this can't happen because it simply does > not have that information. Well, I realize that the existing query planner can't. Since I could arrive at the plan above by deduction so could a hypothetical different query planner. Whether it is faster is of course unknown - my guess is that it would be in this case. > The more I look at this view the more I think it's just seriously broken. > Why is it grouping by order_id at all if, I suspect, there will only be one > record per order_id in eg_orders?? Bryce has confirmed this. The above is only of academic interest. Yours, Nis Jorgensen ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly