sam wun <[EMAIL PROTECTED]> wrote on 01/20/2005 11:45:40 AM: > > [EMAIL PROTECTED] wrote: > > > > > > > SELECT DISTINCT i.basename > > FROM inventory i > > INNER JOIN transaction t > > ON i.prodcode = t.prodcode > > AND t.date >= '2004-01-01' > > AND t.date <= '2004-01-31' > > INNER JOIN transaction tt > > ON i.prodcode = tt.prodcode > > AND tt.date >= '2005-01-01' > > AND tt.date <= '2005-01-31' > > INNER JOIN customer c > > ON c.custcode = t.custcode > > AND c.custcode = tt.custcode > > WHERE i.basename is not NULL > > and i.prodname is not NULL > > order by i.basename > > > > This should give you a list of inventory basenames for all current > > customers (their names are still in the customer table) that "had > > transactions" (ordered products?) during both JAN 2004 and JAN 2005. > > This list will show only the products that were ordered during BOTH > > time periods BY THE SAME CUSTOMER at least once (at least one repeat > > sale, year to year, in JAN). Is this what you were after or was there > > a different question you were trying to answer? > > > Hi, while this is much faster than subquery, I found there is "cumsy" > way faster than this series inner joins. > I dicovered that if I create two different temporary tables with "create
> view as select ..." for 2 differnet period, the join between this temp > tables is also much faster than this series inner joins approach. With > 160000 records in the transaction table, the series inner joins takes > very long time (more than 15 minutes) give out a result. > > Any comment? > Thanks > Sam > > > Shawn Green > > Database Administrator > > Unimin Corporation - Spruce Pine > > Once you start getting into larger sets of data (millions of rows), Taking snapshots (as you did with your CREATE TABLE ... SELECT statements) can provide some significant performance boosts. Here is a numerical analysis of the two situations: Assume we have a table with 1000000 (1Millon = 1e6) rows in it that needs to be joined to itself for a query (much as in the query above). Computing the JOIN of two tables that each contain 1e6 rows could result in a virtual table containing up to 1e12 rows. The actual number of rows in a JOIN will be smaller than the pure Cartesian product because it will be limited to only those rows match your ON conditions. However the number 1e12 represents how many permutations of data the engine must decide between in order to calculate the results of the JOIN. Let's imagine that instead of needing to join the entire table to itself, that we only need to match 10000 (1e4) rows of the table's data against another set of 10000 rows (two nice round figures that could stand in for the actual number of "transaction" records during January of each year in our original query data). A JOIN between those two sets of rows would result in a maximum Cartesian product of only 1e8 rows. If it takes a fixed length of time (not a variable length of time) for the query engine to decide if any one row of a JOIN's Cartesian product belongs to the final JOIN results (based on evaluating the ON conditions against each row combination) then the subset JOIN will reduce the time it takes to compute the virtual table by a factor of 10000 (1e12/1e8 = 1e4). That means that the query engine made 999,999,990,000 fewer comparisons to build the virtual table based on the JOINed sub-tables than it would need to build the same table based on the whole table joined to itself. Your results seem to support this analysis. The time you spent creating the subsets of data (time to select rows + time to write them into a table, twice) was much, MUCH less than the time it would take to sort through all of the other possible combinations of data in your original table that didn't need to participate in your analysis. As I said at the beginning, this is a query optimization technique for certain queries against "larger" datasets. For queries that deal with tables on the order of a few hundred thousand rows or less, it may not apply. Your mileage will vary. Only actual testing will reveal the actual performance of one technique over the other. Shawn Green Database Administrator Unimin Corporation - Spruce Pine