OK, now that I have a better model to work with, let me try again. (And stop being so discouraged! This is hard for everyone until they get the hang of it!!! =8-D )
> Given a query that resembles this "select * from A where (no links) or > ((links to both 3 and 4) and (no links to 2))" This may not be the only way to do this but it's what I thought of first SELECT A.PK_A, A.DATA_A, sum(isnull(L.PK_A)) as nolinks, sum(if(L.PK_B IN (3,4),1,0)) as wanted, sum(if(L.PK_B = 2,1,0)) as rejects FROM A LEFT JOIN L on L.PK_A = A.PK_A GROUP BY 1,2 HAVING nolinks >=0 or (wanted = 2 and rejects=0); This returns three more columns that you originally wanted but it's still a one-trip query. Shawn Green Database Administrator Unimin Corporation - Spruce Pine "Daniel BODEA" <[EMAIL PROTECTED]> wrote on 10/26/2004 11:55:34 AM: > Many thanks for the quick answer. I apologize for leaving out a few details > that should have explained the context in greater detail. I'll use less > theory below to further explain what's going on and then I'll make some > remarks on your answer. > > Simplified down to the basics of this problem, we have two tables, A and B, > both having the same structure: > > PK_A / PK_B - primary key > DATA_A / DATA_B - text > > Then we have table L with two columns: > > PK_A - 0..n of primary key for table A > PK_B - 0..n of primary key for table B > > Both A and B contain 3 rows with PKs 1, 2, 3. Data is not an issue. L > contains the following mappings: > > PK_A | PK_B > 1 | 3 > 1 | 4 > 2 | 2 > 2 | 3 > 2 | 4 > > Given a query that resembles this "select * from A where (no links) or > ((links to both 3 and 4) and (no links to 2))", we should be able to work > (order, limit, etc.) on records 1 and 3 from A. The PKs that the WHERE > clause works on are those of B not A. > > I also thought at first that a sequence of LEFT and INNER joins would both > filter the results correctly and work fast enough for this kind of data but > two days of joining stuff didn't make it happen and frankly I don't think > there is any "one query" solution. > > In your first example you filter on PK_A which is my mistake because the > selection is actually made on PK_B. I apologize for not making this clearer. > Using PK_B in your example though is the first trap I fell into during those > joining days. Whatever your conditions in the WHERE clause, you only apply > them to one row at a time which means the relevant part of the query is > translated into this "where PK_A has links to any of PK_B in (3, 4)". The > NOT IN part has no effect since the row 2 | 2 is excluded but the other two > get through. > > Disregarding PK_A in the WHERE clause and using PK_B instead in the second > example, I believe first of all the INNER JOIN filters out all rows that > don't have any links (NULL rows) and the WHERE conditions abide to the same > rules as the previous example so the results are the same. In any > circumstance, using IN in this context results in the set of indexes being > ORed not ANDed since IN returns true for any value in the set. > > I am totally open to any suggestions for "one query" solutions using joins > but my trying to find the best way to use a temporary or cache table is the > result of having proven to myself the (disputable of course) impossibility > of this thing using one trip only. > > Daniel > > ----- Original Message ----- > From: <[EMAIL PROTECTED]> > To: "Daniel BODEA" <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]> > Sent: Tuesday, October 26, 2004 3:41 PM > Subject: Re: Querying on subsets of one-to-many and many-to-many > > > > Under most circumstances, even with data tables as large as yours, I have > > found that direct JOIN statements have provided acceptable performance. On > > certain highly normalized databases, where the requirements of the query > > needed more than 4 or 5 tables joined at once, I have also used your > > "cache" table idea to great success. > > > > However I would like to believe that queries on the relationships between > > just these three tables could be coded by using JOINS to provide some very > > fast results. Since you chose not to post your real table structures here, > > I must create these sample queries using your sample tables. > > > > select * from REC_A where (no links exist at all) or ((links to REC_B > > exist > > for both PKs 22 and 34) and (no links exist to REC_B for either PK in 56 > > or > > 12) > > > > CREATE TEMPORARY TABLE tmpUnion (primary key(PK_A)) > > SELECT a.* > > FROM REC_A a > > LEFT JOIN LNK > > on LNK.PK_A = a.PK_A > > WHERE LNK.PK_A is null; > > > > INSERT IGNORE tmpUnion > > SELECT a.* > > FROM REC_A a > > INNER JOIN LNK > > on LNK.PK_A = a.PK_A > > WHERE LNK.PK_A in (22,34) > > AND LNK.PK_A NOT IN (12,56); > > > > SELECT * from tmpUnion; > > DROP TEMPORARY TABLE tmpUnion; > > > > Whenever you have an OR like that in your WHERE clause, you are in essence > > running two or more queries and combining the results. Since you are on a > > > pre-4.0 server, you do not have the UNION operator available to help you > > so I had to simulate one by using a temporary table. Basically I had to > > hand-code what the UNION operator automates for you. I defined the Primary > > Key on the temp table and used the IGNORE option on the INSERT statement > > to prevent duplicate rows (logically you shouldn't have had any but I was > > being thorough) > > > > Here is an equivalent query without the temp table > > > > SELECT DISTINCT a.* > > FROM REC_A a > > LEFT JOIN LNK m1 > > ON m1.PK_A = a.PK_A > > INNER JOIN LNK m2 > > ON m2.PK_A = a.PK_A > > WHERE m1.PK_A is null > > OR ( > > m2.PK_A in (22,34) > > AND m2.PK_A NOT IN (12,56) > > ); > > > > > > "Daniel BODEA" <[EMAIL PROTECTED]> wrote on 10/26/2004 09:02:56 AM: > > > > > Greetings, > > > > > > I have several solutions to a problem which requires temporary or cache > > > tables but the different solutions are time consuming to test for > > > performance and I was wondering if anyone had more information on this > > than > > > myself. > > > > > > The server is 3.23.40 without possibility for upgrade to 4.x because of > > the > > > amount of extra work required for testing, deployment, etc. > > > > > > The simplified context is that I have several tables for individual > > records > > > (REC_A, REC_B, etc.) with primary keys and one table for links between > > > different types of records based on their primary keys (LNK). LNK is N > > to N > > > so there is no constraint on the number of links a record can have to > > other > > > records. > > > > > > I have to query these relationships in complex ways, one example being > > > "select * from REC_A where (no links exist at all) or ((links to REC_B > > exist > > > for both PKs 22 and 34) and (no links exist to REC_B for either PK in 56 > > or > > > 12))". The number of records is in the hundreds of thousands and the > > links > > > can get past a million. > > > > > > Now since this version does not support GROUP_CONCAT and having given up > > > after a couple of days of trying to torture recursive INNER and LEFT > > joins > > > to filter out records based on totally incomprehensible ON clauses, I > > figure > > > I need temporary or cache intermediary tables. > > > > > > The solutions that I see would be to have (1) a generic "flag" table > > that a > > > complex query would use to first flag the records that should get > > through > > > and then join and filter with that table to return the list of records > > but > > > that would mean locking and updating the table for each complex query, > > (2) > > > generate an "alternate view" of the LNK table by simulating the > > GROUP_CONCAT > > > function to aggregate the records in unique rows that contain the full > > set > > > of links. The question for (2) is how to store the full link set in one > > row > > > so that queries be able to make use of indexes. (3) would be a variation > > of > > > (1) by creating a temporary table per complex query that would only > > serve > > > the purposes of this query and which would contain only those record > > indexes > > > that should go through so that the complex query can then perform a > > simple > > > INNER JOIN to filter out the results. > > > > > > I've also thought about retrieving the full set of indexes for all > > matched > > > records programmatically and then reinjecting this in the main query as > > a PK > > > IN (...) but that would give really huge index sets that would go back > > and > > > forth between the server and the client so I ruled this one out. > > > > > > Now unless there is another simpler and miraculous solution that I > > haven't > > > thought of, I have to find out which one of these alternatives is the > > best > > > one speed-wise so any help on this is sincerely appreciated. > > > > > > Daniel > > > > > > > > > -- > > > MySQL General Mailing List > > > For list archives: http://lists.mysql.com/mysql > > > To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] > > > > > >