Re: [SQL] Returning the total number of rows as a separate column when using limit
Andreas Joseph Krogh [EMAIL PROTECTED] writes: AFAICS the information about the *total* number of rows is in the result somehow. When I execute a limit 1 query with EXPLAIN ANALYZE, I se the total number of columns in rows=200819, so the information is there. That's only an estimate. Since the query doesn't get executed to completion thanks to the LIMIT, Postgres really has no idea whether the estimate is accurate. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] Returning the total number of rows as a separate column when using limit
On Monday 05 November 2007 15:18:22 Tom Lane wrote: Andreas Joseph Krogh [EMAIL PROTECTED] writes: AFAICS the information about the *total* number of rows is in the result somehow. When I execute a limit 1 query with EXPLAIN ANALYZE, I se the total number of columns in rows=200819, so the information is there. That's only an estimate. Since the query doesn't get executed to completion thanks to the LIMIT, Postgres really has no idea whether the estimate is accurate. Ok. The query is ORDER-ed, but you're saying that it doesn't matter and PG still doesn't have to know the total numbers even if it has to sort the result? Is there a way to perform the LIMIT-query only once and still be able to extract the total number of rows from some magic function, like Oracle's over() analytic function? I will accept a simple no if that is the case, but if it is possible (like it is in Oracle) I would appreciate to know how to do it. -- Andreas Joseph Krogh [EMAIL PROTECTED] Senior Software Developer / Manager +-+ OfficeNet AS| The most difficult thing in the world is to | Karenslyst Allé 11 | know how to do a thing and to watch | PO. Box 529 Skøyen | somebody else doing it wrong, without | 0214 Oslo | comment.| NORWAY | | Tlf:+47 24 15 38 90 | | Fax:+47 24 15 38 91 | | Mobile: +47 909 56 963 | | +-+ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] Returning the total number of rows as a separate column when using limit
Andreas Joseph Krogh [EMAIL PROTECTED] writes: On Monday 05 November 2007 15:18:22 Tom Lane wrote: That's only an estimate. Since the query doesn't get executed to completion thanks to the LIMIT, Postgres really has no idea whether the estimate is accurate. Ok. The query is ORDER-ed, but you're saying that it doesn't matter and PG still doesn't have to know the total numbers even if it has to sort the result? If there were a sort then the sort node would know how many rows it had sorted, but if you've got a small limit that's certainly not the plan type you'd prefer. The bottom line is that there is no free lunch. If you want an exact row count you have to execute the whole query, and it's gonna cost you. If you're willing to settle for an approximation, the usual thing is to EXPLAIN the query and dredge the row estimate out of that. create function estimate_rows(qry text) returns float8 as $$ declare r text; begin for r in execute 'explain ' || qry loop if substring(r from 'rows=[0-9]') is not null then return substring(r from 'rows=([0-9]+)'); end if; end loop; return null; end$$ language plpgsql strict; regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [SQL] Returning the total number of rows as a separate column when using limit
Andreas Joseph Krogh [EMAIL PROTECTED] writes: On Monday 05 November 2007 15:18:22 Tom Lane wrote: That's only an estimate. Since the query doesn't get executed to completion thanks to the LIMIT, Postgres really has no idea whether the estimate is accurate. Ok. The query is ORDER-ed, but you're saying that it doesn't matter and PG still doesn't have to know the total numbers even if it has to sort the result? At a guess you're displaying pages of information and want to display something like displaying 1-10 of 150 ? Postgres is kind of lacking a solution for this problem. The last time I needed to do this I bit the bullet and ran the query twice, once with a select count(*) from (...) around it and once with select * from (...) order by x offset n limit m around it. The second time runs relatively quickly since all the raw data is in cache. The right way to do this would probably be to have a temporary table which you populate in one step, perform the count(*) on in a second query, then select the page of data with the ordering in a third query. Then you can keep the data around for some limited amount of time in case the user accesses a second page. But this requires some infrastructure to keep track of the cached data and what query it corresponded to and determine when to replace it with new data or drop it. However Postgres temporary tables are fairly expensive and if you're creating them for every web access you're going to have to vacuum the system catalogs quite frequently. They're not really well suited for this task. Alternatively you could create a cursor and play with that. But I don't think that's a great solution either. (yet? I think cursors are getting more useful in Postgres, perhaps it will be eventually.) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] Returning the total number of rows as a separate column when using limit
On Monday 05 November 2007 16:27:03 Gregory Stark wrote: Andreas Joseph Krogh [EMAIL PROTECTED] writes: On Monday 05 November 2007 15:18:22 Tom Lane wrote: That's only an estimate. Since the query doesn't get executed to completion thanks to the LIMIT, Postgres really has no idea whether the estimate is accurate. Ok. The query is ORDER-ed, but you're saying that it doesn't matter and PG still doesn't have to know the total numbers even if it has to sort the result? At a guess you're displaying pages of information and want to display something like displaying 1-10 of 150 ? Exactly:-) Postgres is kind of lacking a solution for this problem. The last time I needed to do this I bit the bullet and ran the query twice, once with a select count(*) from (...) around it and once with select * from (...) order by x offset n limit m around it. The second time runs relatively quickly since all the raw data is in cache. That's what I'm doing now. I run the query with limit+1 as limit and if it results in more than limit, I know there is more data and I run count(*) to count them all. But count(*) cannot use indices in PG so it's limited in speed anyway AFAICS. The right way to do this would probably be to have a temporary table which you populate in one step, perform the count(*) on in a second query, then select the page of data with the ordering in a third query. Then you can keep the data around for some limited amount of time in case the user accesses a second page. But this requires some infrastructure to keep track of the cached data and what query it corresponded to and determine when to replace it with new data or drop it. However Postgres temporary tables are fairly expensive and if you're creating them for every web access you're going to have to vacuum the system catalogs quite frequently. They're not really well suited for this task. Alternatively you could create a cursor and play with that. But I don't think that's a great solution either. (yet? I think cursors are getting more useful in Postgres, perhaps it will be eventually.) I really hoped there was an Oracle over() equivalent way in PG. I understand that Oracle's LIMIT-hack with 3 subselects and rownum between 1 AND 20 is rather expensive compared to PG's implementation of LIMIT. Oralce keeps snapshot-info in the index, so counting only involves the index AFAIK. -- Andreas Joseph Krogh [EMAIL PROTECTED] Senior Software Developer / Manager +-+ OfficeNet AS| The most difficult thing in the world is to | Karenslyst Allé 11 | know how to do a thing and to watch | PO. Box 529 Skøyen | somebody else doing it wrong, without | 0214 Oslo | comment.| NORWAY | | Tlf:+47 24 15 38 90 | | Fax:+47 24 15 38 91 | | Mobile: +47 909 56 963 | | +-+ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Returning the total number of rows as a separate column when using limit
Andreas Joseph Krogh [EMAIL PROTECTED] writes: That's what I'm doing now. I run the query with limit+1 as limit and if it results in more than limit, I know there is more data and I run count(*) to count them all. But count(*) cannot use indices in PG so it's limited in speed anyway AFAICS. Well count(*) can use indexes the same as the query can. I really hoped there was an Oracle over() equivalent way in PG. I understand that Oracle's LIMIT-hack with 3 subselects and rownum between 1 AND 20 is rather expensive compared to PG's implementation of LIMIT. Oralce keeps snapshot-info in the index, so counting only involves the index AFAIK. Well that's only going to be true if the index satisfies the whole query which is not going to be true for the simplest cases. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster