Re: [SQL] Returning the total number of rows as a separate column when using limit

2007-11-05 Thread Tom Lane
Andreas Joseph Krogh [EMAIL PROTECTED] writes:
 AFAICS the information about the *total* number of rows is in the result 
 somehow. When I execute a limit 1 query with EXPLAIN ANALYZE, I se the 
 total number of columns in rows=200819, so the information is there.

That's only an estimate.  Since the query doesn't get executed to
completion thanks to the LIMIT, Postgres really has no idea whether
the estimate is accurate.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Returning the total number of rows as a separate column when using limit

2007-11-05 Thread Andreas Joseph Krogh
On Monday 05 November 2007 15:18:22 Tom Lane wrote:
 Andreas Joseph Krogh [EMAIL PROTECTED] writes:
  AFAICS the information about the *total* number of rows is in the
  result somehow. When I execute a limit 1 query with EXPLAIN ANALYZE,
  I se the total number of columns in rows=200819, so the information is
  there.

 That's only an estimate.  Since the query doesn't get executed to
 completion thanks to the LIMIT, Postgres really has no idea whether
 the estimate is accurate.

Ok. The query is ORDER-ed, but you're saying that it doesn't matter and PG 
still doesn't have to know the total numbers even if it has to sort the 
result?

Is there a way to perform the LIMIT-query only once and still be able to 
extract the total number of rows from some magic function, like 
Oracle's over() analytic function? I will accept a simple no if that is 
the case, but if it is possible (like it is in Oracle) I would appreciate to 
know how to do it.

-- 
Andreas Joseph Krogh [EMAIL PROTECTED]
Senior Software Developer / Manager
+-+
OfficeNet AS| The most difficult thing in the world is to |
Karenslyst Allé 11  | know how to do a thing and to watch |
PO. Box 529 Skøyen  | somebody else doing it wrong, without   |
0214 Oslo   | comment.|
NORWAY  | |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Returning the total number of rows as a separate column when using limit

2007-11-05 Thread Tom Lane
Andreas Joseph Krogh [EMAIL PROTECTED] writes:
 On Monday 05 November 2007 15:18:22 Tom Lane wrote:
 That's only an estimate.  Since the query doesn't get executed to
 completion thanks to the LIMIT, Postgres really has no idea whether
 the estimate is accurate.

 Ok. The query is ORDER-ed, but you're saying that it doesn't matter and PG 
 still doesn't have to know the total numbers even if it has to sort the 
 result?

If there were a sort then the sort node would know how many rows it had
sorted, but if you've got a small limit that's certainly not the plan
type you'd prefer.

The bottom line is that there is no free lunch.  If you want an exact
row count you have to execute the whole query, and it's gonna cost you.
If you're willing to settle for an approximation, the usual thing is
to EXPLAIN the query and dredge the row estimate out of that.

create function estimate_rows(qry text) returns float8 as $$
declare r text;
begin
  for r in execute 'explain ' || qry loop
if substring(r from 'rows=[0-9]') is not null then
  return substring(r from 'rows=([0-9]+)');
end if;
  end loop;
  return null;
end$$ language plpgsql strict;

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [SQL] Returning the total number of rows as a separate column when using limit

2007-11-05 Thread Gregory Stark

 Andreas Joseph Krogh [EMAIL PROTECTED] writes:
 On Monday 05 November 2007 15:18:22 Tom Lane wrote:
 That's only an estimate.  Since the query doesn't get executed to
 completion thanks to the LIMIT, Postgres really has no idea whether
 the estimate is accurate.

 Ok. The query is ORDER-ed, but you're saying that it doesn't matter and PG 
 still doesn't have to know the total numbers even if it has to sort the 
 result?

At a guess you're displaying pages of information and want to display
something like displaying 1-10 of 150 ?

Postgres is kind of lacking a solution for this problem. The last time I
needed to do this I bit the bullet and ran the query twice, once with a
select count(*) from (...) around it and once with select * from (...)
order by x offset n limit m around it. The second time runs relatively
quickly since all the raw data is in cache.

The right way to do this would probably be to have a temporary table which
you populate in one step, perform the count(*) on in a second query, then
select the page of data with the ordering in a third query. Then you can keep
the data around for some limited amount of time in case the user accesses a
second page. But this requires some infrastructure to keep track of the cached
data and what query it corresponded to and determine when to replace it with
new data or drop it.

However Postgres temporary tables are fairly expensive and if you're creating
them for every web access you're going to have to vacuum the system catalogs
quite frequently. They're not really well suited for this task.

Alternatively you could create a cursor and play with that. But I don't think
that's a great solution either. (yet? I think cursors are getting more useful
in Postgres, perhaps it will be eventually.)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [SQL] Returning the total number of rows as a separate column when using limit

2007-11-05 Thread Andreas Joseph Krogh
On Monday 05 November 2007 16:27:03 Gregory Stark wrote:
  Andreas Joseph Krogh [EMAIL PROTECTED] writes:
  On Monday 05 November 2007 15:18:22 Tom Lane wrote:
  That's only an estimate.  Since the query doesn't get executed to
  completion thanks to the LIMIT, Postgres really has no idea whether
  the estimate is accurate.
 
  Ok. The query is ORDER-ed, but you're saying that it doesn't matter and
  PG still doesn't have to know the total numbers even if it has to sort
  the result?

 At a guess you're displaying pages of information and want to display
 something like displaying 1-10 of 150 ?

Exactly:-)

 Postgres is kind of lacking a solution for this problem. The last time I
 needed to do this I bit the bullet and ran the query twice, once with a
 select count(*) from (...) around it and once with select * from (...)
 order by x offset n limit m around it. The second time runs relatively
 quickly since all the raw data is in cache.

That's what I'm doing now. I run the query with limit+1 as limit and if it 
results in more than limit, I know there is more data and I run count(*) to 
count them all. But count(*) cannot use indices in PG so it's limited in 
speed anyway AFAICS.

 The right way to do this would probably be to have a temporary table
 which you populate in one step, perform the count(*) on in a second query,
 then select the page of data with the ordering in a third query. Then you
 can keep the data around for some limited amount of time in case the user
 accesses a second page. But this requires some infrastructure to keep track
 of the cached data and what query it corresponded to and determine when to
 replace it with new data or drop it.

 However Postgres temporary tables are fairly expensive and if you're
 creating them for every web access you're going to have to vacuum the
 system catalogs quite frequently. They're not really well suited for this
 task.

 Alternatively you could create a cursor and play with that. But I don't
 think that's a great solution either. (yet? I think cursors are getting
 more useful in Postgres, perhaps it will be eventually.)

I really hoped there was an Oracle over() equivalent way in PG. I understand 
that Oracle's LIMIT-hack with 3 subselects and rownum between 1 AND 20 is 
rather expensive compared to PG's implementation of LIMIT. Oralce keeps 
snapshot-info in the index, so counting only involves the index AFAIK.

-- 
Andreas Joseph Krogh [EMAIL PROTECTED]
Senior Software Developer / Manager
+-+
OfficeNet AS| The most difficult thing in the world is to |
Karenslyst Allé 11  | know how to do a thing and to watch |
PO. Box 529 Skøyen  | somebody else doing it wrong, without   |
0214 Oslo   | comment.|
NORWAY  | |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] Returning the total number of rows as a separate column when using limit

2007-11-05 Thread Gregory Stark

Andreas Joseph Krogh [EMAIL PROTECTED] writes:

 That's what I'm doing now. I run the query with limit+1 as limit and if it 
 results in more than limit, I know there is more data and I run count(*) to 
 count them all. But count(*) cannot use indices in PG so it's limited in 
 speed anyway AFAICS.

Well count(*) can use indexes the same as the query can.

 I really hoped there was an Oracle over() equivalent way in PG. I 
 understand 
 that Oracle's LIMIT-hack with 3 subselects and rownum between 1 AND 20 is 
 rather expensive compared to PG's implementation of LIMIT. Oralce keeps 
 snapshot-info in the index, so counting only involves the index AFAIK.

Well that's only going to be true if the index satisfies the whole query which
is not going to be true for the simplest cases.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster