Under most circumstances, even with data tables as large as yours, I have found that direct JOIN statements have provided acceptable performance. On certain highly normalized databases, where the requirements of the query needed more than 4 or 5 tables joined at once, I have also used your "cache" table idea to great success.
However I would like to believe that queries on the relationships between just these three tables could be coded by using JOINS to provide some very fast results. Since you chose not to post your real table structures here, I must create these sample queries using your sample tables. select * from REC_A where (no links exist at all) or ((links to REC_B exist for both PKs 22 and 34) and (no links exist to REC_B for either PK in 56 or 12) CREATE TEMPORARY TABLE tmpUnion (primary key(PK_A)) SELECT a.* FROM REC_A a LEFT JOIN LNK on LNK.PK_A = a.PK_A WHERE LNK.PK_A is null; INSERT IGNORE tmpUnion SELECT a.* FROM REC_A a INNER JOIN LNK on LNK.PK_A = a.PK_A WHERE LNK.PK_A in (22,34) AND LNK.PK_A NOT IN (12,56); SELECT * from tmpUnion; DROP TEMPORARY TABLE tmpUnion; Whenever you have an OR like that in your WHERE clause, you are in essence running two or more queries and combining the results. Since you are on a pre-4.0 server, you do not have the UNION operator available to help you so I had to simulate one by using a temporary table. Basically I had to hand-code what the UNION operator automates for you. I defined the Primary Key on the temp table and used the IGNORE option on the INSERT statement to prevent duplicate rows (logically you shouldn't have had any but I was being thorough) Here is an equivalent query without the temp table SELECT DISTINCT a.* FROM REC_A a LEFT JOIN LNK m1 ON m1.PK_A = a.PK_A INNER JOIN LNK m2 ON m2.PK_A = a.PK_A WHERE m1.PK_A is null OR ( m2.PK_A in (22,34) AND m2.PK_A NOT IN (12,56) ); "Daniel BODEA" <[EMAIL PROTECTED]> wrote on 10/26/2004 09:02:56 AM: > Greetings, > > I have several solutions to a problem which requires temporary or cache > tables but the different solutions are time consuming to test for > performance and I was wondering if anyone had more information on this than > myself. > > The server is 3.23.40 without possibility for upgrade to 4.x because of the > amount of extra work required for testing, deployment, etc. > > The simplified context is that I have several tables for individual records > (REC_A, REC_B, etc.) with primary keys and one table for links between > different types of records based on their primary keys (LNK). LNK is N to N > so there is no constraint on the number of links a record can have to other > records. > > I have to query these relationships in complex ways, one example being > "select * from REC_A where (no links exist at all) or ((links to REC_B exist > for both PKs 22 and 34) and (no links exist to REC_B for either PK in 56 or > 12))". The number of records is in the hundreds of thousands and the links > can get past a million. > > Now since this version does not support GROUP_CONCAT and having given up > after a couple of days of trying to torture recursive INNER and LEFT joins > to filter out records based on totally incomprehensible ON clauses, I figure > I need temporary or cache intermediary tables. > > The solutions that I see would be to have (1) a generic "flag" table that a > complex query would use to first flag the records that should get through > and then join and filter with that table to return the list of records but > that would mean locking and updating the table for each complex query, (2) > generate an "alternate view" of the LNK table by simulating the GROUP_CONCAT > function to aggregate the records in unique rows that contain the full set > of links. The question for (2) is how to store the full link set in one row > so that queries be able to make use of indexes. (3) would be a variation of > (1) by creating a temporary table per complex query that would only serve > the purposes of this query and which would contain only those record indexes > that should go through so that the complex query can then perform a simple > INNER JOIN to filter out the results. > > I've also thought about retrieving the full set of indexes for all matched > records programmatically and then reinjecting this in the main query as a PK > IN (...) but that would give really huge index sets that would go back and > forth between the server and the client so I ruled this one out. > > Now unless there is another simpler and miraculous solution that I haven't > thought of, I have to find out which one of these alternatives is the best > one speed-wise so any help on this is sincerely appreciated. > > Daniel > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] >