I'd better ask the obvious question... *which* one?  Is there one in
particular that you're after, or would you be equally happy with any of
them?  The latter would be a bit odd.

Each row in table1 has several corresponding rows in table2. Any of them that match the query would be good.


So the bulk of the selection criteria are on table2, not table1?

It shouldn't matter either way (and I cannot tell which will be the bulk ahead of time).


Wouldn't you get the same result from your query by using an INNER JOIN and dropping the temp.xy IS NOT NULL clause?

I am not sure, that is why I am asking y'all. My only problem with that is does INNER JOIN create a 200k table with 180 columns and do the selection from that? Or does it SELECT from one table, JOIN the results to the other table, then SELECT again? (which is what I want)


I have played around with it a bit and if I am doing my guess-tamating correctly, these two SQL statements will normally take less than a second to do the selection from the two tables and return the statistics. This compares very well with the .92 sec required to pull up the stats on all of table1. (the query in question will pull up some subset of table1) Another advantage is with other queries of this set I will have half of the query cached for me (depending on the query, of course) in the table temp.

In case anyone is wondering with these massive tables, I am building a dynamic web site that will let scientists view statistics on various environments/landscapes and how they relate to each other. Currently the largest table is basically the entire state of Wisconsin split up into 6km x 6km chunks at 30m resolution. This yields 7k chunks to keep track of, and on average 10 different environments per chunk. I am trying to design it well enough though that we could in the future use all of the Landsat imagery for the entire U.S. (also 6km x 6km chunks at 30m resolution). There are about 90 different possible statistics per landscape.
Thx,
Seth



On Sunday, March 2, 2003, at 06:31 PM, Bruce Feist wrote:


Seth Price wrote:

I am dealing with two tables. One, I'll call table1 has about 90 columns and 20k rows, each number in column "xy" is unique. The other has about 90 columns and about 200k rows, and there will be around 10 duplicate entries in "xy" for each value of "xy". So, if I SELECT a given number in table1.xy, one row is returned, then if I SELECT the same number in table2.xy, about 10 rows will be returned. There is always at least one entry in table2.xy for each entry in table1.xy

I am looking for the best way to join them so for each row in the result is a row of table1 lined up with one (and only one) row from table2.

I'd better ask the obvious question... *which* one? Is there one in particular that you're after, or would you be equally happy with any of them? The latter would be a bit odd.

I am aware that I could simply GROUP BY xy, but I am also trying to use this with a few AVG(<some other column>) clauses to get the average of all rows returned. GROUP BY xy only returns the average of each grouping. I am also guessing that GROUP By in this instance would be rather inefficient because It would first have to build a large table (>200k rows) and then SELECT from it.

This is complex enough so that I'll need a more detailed example of your query to explain what you're trying to accomplish. But, on the surface, it sounds as though you're already using GROUP BY in the same SELECT, in
which case you won't be taking on a significant amount of additional
overhead -- you're already scanning all the table2 rows because of your
GROUP BY. Or, are you talking about multiple SELECT statements?


Bruce Feist




On Sunday, March 2, 2003, at 06:32 PM, Bruce Feist wrote:

Seth Price wrote:

Since writing this message, I have discovered another possible way for doing this with two successive SQL statements. It would look something like this:

1) CREATE TEMPORARY TABLE temp SELECT DISTINCT xy FROM table2 WHERE <all of table2 conditions>; (going by data below, maximum of 20k rows because it is DISTINCT)

So the bulk of the selection criteria are on table2, not table1?


2) SELECT AVG(column1),<all other group calculations> FROM table1 LEFT JOIN temp ON temp.xy=table1.xy WHERE <all of table1 conditions> AND temp.xy IS NOT NULL; (maximum of 20k rows, less after both WHERE statements tho)

Wouldn't you get the same result from your query by using an INNER JOIN and dropping the temp.xy IS NOT NULL clause?


Bruce Feist




--------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




---------------------------------------------------------------------
Before posting, please check:
  http://www.mysql.com/manual.php   (the manual)
  http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php



Reply via email to