(been trying to answer this question for a long time; a recent post on the topic has convinced me to admit defeat and ask the list...)
I have a large database and would like to output 'results' page by page. 'Results' refers to fairly general database computations, including groups, joins, logical subqueries (implemented with temporary tables), as well as raw row-by-row data. The database has appropriate indices to make all of these operations efficient, so the mammoth database size (tens of millions of rows at the least) does not slow anything down. Of course, the simplest strategy is just to use LIMIT on every query and provide 'previous' and 'next' buttons on every page. Clearly, you know that the first page needs no 'previous' button, but figuring out if there should be a 'next' is a problem. My simple hack is just to ask for one more row than I plan to display, and if I get it, ignore it but put in a 'next' button. Ugly... The concerns have already begun, since as our database grows the total number of rows returned by these queries will grow to the millions, while people will generally only care about the first few pages of results. Does MySQL guarantee that it will always try to use indices to implement ORDER...LIMIT clauses? I never want to ask my database to sort several million records on every query; if MySQL isn't smart enough to optimize this I need to start putting together an automated merge table architecture... Also, the simple LIMIT clause isn't a general solution in the case of sub-queries (using temporary tables) and GROUP clauses, because there is not necessarily a correlation between number of rows in intermediate results and output results. The specific queries I use have been structured so as to compute intermediate results in an order which eliminates the unnecessary work, but I won't always have the luxury of such easily-optimized queries. Further, an interface which provides only 'previous' and 'next' buttons is clearly mediocre at best; what I'd really like is to spit out page number buttons (probably with exponential backoff: 1, 2, 3, 4, 5 ...10, 20, 30, 40, 50 ...100, 200, 300, 400, 500...), and this requires knowing the total number of rows the queries return. For trivial raw-data queries, this is simple; run the query twice: once with COUNT(*) and once with the actual field list. It does worry me a bit that this may require MySQL to actually analyze millions and millions of rows for every page spit out; can this COUNT be computed purely from index ranges? And I would dearly love an equivalent to num_affected_rows which told you how many rows matched a query in spite of your LIMIT clause so that the database didn't have to run the query twice... For more complex queries, things get more complicated. Simply adding a single GROUP to a query means that COUNT(*) loses its 'how-many-rows-in-result' meaning, so I've got to store the result in a temporary table before running a count on it, and even if MySQL were smart enough to optimize normal COUNT and ORDER...LIMIT, unless temporary tables are *really* smart (some kind of spiffy lazy evaluation) this would completely defeat those optimizations. Is there an efficient way to count the number of rows in the total result of a grouped query? When sub-selects and joins enter the picture, things get even worse, as large and complex queries must be run against the entire database to compute these giant temporary tables where I could previously hand-optimize the execution sequences to operate on only some multiple (or small exponent) of the number of records per page. Again, I can simply work to hand-optimize independent queries to return the counts directly, but it's even less general, still not provably efficient, and provides a substantial opportunity for bugs relating to inconsistency between the number of results expected and the number that could actually be found. Sigh. I'd love for someone to provide a magic bullet that I can add to all my SQL to just make it work, but at least in the more complex cases I don't think efficent solutions exist. What I'm really hoping is that a few people with intimate knowledge of the MySQL optimizer could answer the specific questions I've posed here, as well as offer a few tutorials on exactly how the optimizer and execution engine interact with order, limit, and group clauses as well as temporary tables so that I can at least design my queries with my eyes open. Maybe we can even work to add such information to the large body of documentation on where clause optimization. Many many thanks in advance for your help, Rob --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php