On Tuesday 12 October 2010 14:35:01 you wrote:
> > suggest that 99% instances of the "select count(*)" idiom are probably
> > 
> >> bad use of the SQL language.
> 
> Well, suppose you paginate results. If the user sees that the search query
> returns 500 pages, there are two options :
> 
> - you're google, and your sorting algorithms are so good that the answer
> the user wants is in the first page
> - or the user will refine his search by entering more keywords tu get a
> manageable result set
> 
> So, in both cases, the count(*) was useless anyway. And the slowest ones
> are the most useless, since the user will immediatey discard the result
> and refine his query.
> 
> If your full text search is slow, try Xapian or Lucene.

I guess I have to comment here again and point out that while I am having this 
issue with text searches, I avoid using count(*) in such cases, I just use 
next and previous links. Where the real problem (for me) is that when someone 
searches a date or time range. My application keeps track of huge amounts of 
realtime transactional data. So an administrator might want a report as to 
what some data point did yesterday between 3 and 4 PM. Under normal conditions 
the range of records that match can be between 0 and over 5,000. This is 
really killing me especially when the reporting people want a list of how many 
transactions each that were on points in a given zipcode had this morning 
between 8 and 9 AM, it takes about 5 minutes to run on a server that has 
enough ram to hold the entire table!

Pseudo query:
Show how many transactions per node in zipcode 92252 between 8:00 and 9:00 
today:

point_number | number_of_transactions
65889   |       31
34814   |       4865
28349   |       0
3358    |       364
...

24 total rows, > 5 minutes.

Then they want every node to be a link to a list of actual data within the 
specified timeframe.
This is where I have to to the same query twice to first find out how many for 
the page links, then again to get a page of results.
Sure, I could keep tables around that have numbers by the hour, minute, day or 
whatever to cache up results for speeding things, then the problem is that 
when the data is put into the server, there are so many statistics tables to 
update, the front end becomes a huge problem. Also, it makes for a huge mess 
of tables to think about when I need to make a report.

-Neil-



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to