Many thanks for the quick answer. I apologize for leaving out a few details
that should have explained the context in greater detail. I'll use less
theory below to further explain what's going on and then I'll make some
remarks on your answer.

Simplified down to the basics of this problem, we have two tables, A and B,
both having the same structure:

PK_A / PK_B - primary key
DATA_A / DATA_B - text

Then we have table L with two columns:

PK_A - 0..n of primary key for table A
PK_B - 0..n of primary key for table B

Both A and B contain 3 rows with PKs 1, 2, 3. Data is not an issue. L
contains the following mappings:

PK_A | PK_B
1 | 3
1 | 4
2 | 2
2 | 3
2 | 4

Given a query that resembles this "select * from A where (no links) or
((links to both 3 and 4) and (no links to 2))", we should be able to work
(order, limit, etc.) on records 1 and 3 from A. The PKs that the WHERE
clause works on are those of B not A.

I also thought at first that a sequence of LEFT and INNER joins would both
filter the results correctly and work fast enough for this kind of data but
two days of joining stuff didn't make it happen and frankly I don't think
there is any "one query" solution.

In your first example you filter on PK_A which is my mistake because the
selection is actually made on PK_B. I apologize for not making this clearer.
Using PK_B in your example though is the first trap I fell into during those
joining days. Whatever your conditions in the WHERE clause, you only apply
them to one row at a time which means the relevant part of the query is
translated into this "where PK_A has links to any of PK_B in (3, 4)". The
NOT IN part has no effect since the row 2 | 2 is excluded but the other two
get through.

Disregarding PK_A in the WHERE clause and using PK_B instead in the second
example, I believe first of all the INNER JOIN filters out all rows that
don't have any links (NULL rows) and the WHERE conditions abide to the same
rules as the previous example so the results are the same. In any
circumstance, using IN in this context results in the set of indexes being
ORed not ANDed since IN returns true for any value in the set.

I am totally open to any suggestions for "one query" solutions using joins
but my trying to find the best way to use a temporary or cache table is the
result of having proven to myself the (disputable of course) impossibility
of this thing using one trip only.

Daniel

----- Original Message ----- 
From: <[EMAIL PROTECTED]>
To: "Daniel BODEA" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, October 26, 2004 3:41 PM
Subject: Re: Querying on subsets of one-to-many and many-to-many


> Under most circumstances, even with data tables as large as yours, I have
> found that direct JOIN statements have provided acceptable performance. On
> certain highly normalized databases, where the requirements of the query
> needed more than 4 or 5 tables joined at once, I have also used your
> "cache" table idea to great success.
>
> However I would like to believe that queries on the relationships between
> just these three tables could be coded by using JOINS to provide some very
> fast results. Since you chose not to post your real table structures here,
> I must create these sample queries using your sample tables.
>
> select * from REC_A where (no links exist at all) or ((links to REC_B
> exist
> for both PKs 22 and 34) and (no links exist to REC_B for either PK in 56
> or
> 12)
>
> CREATE TEMPORARY TABLE tmpUnion (primary key(PK_A))
> SELECT a.*
> FROM REC_A a
> LEFT JOIN LNK
>         on LNK.PK_A = a.PK_A
> WHERE LNK.PK_A is null;
>
> INSERT IGNORE tmpUnion
> SELECT a.*
> FROM REC_A a
> INNER JOIN LNK
>         on LNK.PK_A = a.PK_A
> WHERE LNK.PK_A in (22,34)
>         AND LNK.PK_A NOT IN (12,56);
>
> SELECT * from tmpUnion;
> DROP TEMPORARY TABLE tmpUnion;
>
> Whenever you have an OR like that in your WHERE clause, you are in essence
> running two or more queries and combining the results. Since you are on a

> pre-4.0 server, you do not have the UNION operator available to help you
> so I had to simulate one by using a temporary table. Basically I had to
> hand-code what the UNION operator automates for you. I defined the Primary
> Key on the temp table and used the IGNORE option on the INSERT statement
> to prevent duplicate rows (logically you shouldn't have had any but I was
> being thorough)
>
> Here is an equivalent query without the temp table
>
> SELECT DISTINCT a.*
> FROM REC_A a
> LEFT JOIN LNK  m1
>         ON m1.PK_A = a.PK_A
> INNER JOIN LNK m2
>         ON m2.PK_A = a.PK_A
> WHERE m1.PK_A is null
>         OR (
>                 m2.PK_A in (22,34)
>                 AND m2.PK_A NOT IN (12,56)
>         );
>
>
> "Daniel BODEA" <[EMAIL PROTECTED]> wrote on 10/26/2004 09:02:56 AM:
>
> > Greetings,
> >
> > I have several solutions to a problem which requires temporary or cache
> > tables but the different solutions are time consuming to test for
> > performance and I was wondering if anyone had more information on this
> than
> > myself.
> >
> > The server is 3.23.40 without possibility for upgrade to 4.x because of
> the
> > amount of extra work required for testing, deployment, etc.
> >
> > The simplified context is that I have several tables for individual
> records
> > (REC_A, REC_B, etc.) with primary keys and one table for links between
> > different types of records based on their primary keys (LNK). LNK is N
> to N
> > so there is no constraint on the number of links a record can have to
> other
> > records.
> >
> > I have to query these relationships in complex ways, one example being
> > "select * from REC_A where (no links exist at all) or ((links to REC_B
> exist
> > for both PKs 22 and 34) and (no links exist to REC_B for either PK in 56
> or
> > 12))". The number of records is in the hundreds of thousands and the
> links
> > can get past a million.
> >
> > Now since this version does not support GROUP_CONCAT and having given up
> > after a couple of days of trying to torture recursive INNER and LEFT
> joins
> > to filter out records based on totally incomprehensible ON clauses, I
> figure
> > I need temporary or cache intermediary tables.
> >
> > The solutions that I see would be to have (1) a generic "flag" table
> that a
> > complex query would use to first flag the records that should get
> through
> > and then join and filter with that table to return the list of records
> but
> > that would mean locking and updating the table for each complex query,
> (2)
> > generate an "alternate view" of the LNK table by simulating the
> GROUP_CONCAT
> > function to aggregate the records in unique rows that contain the full
> set
> > of links. The question for (2) is how to store the full link set in one
> row
> > so that queries be able to make use of indexes. (3) would be a variation
> of
> > (1) by creating a temporary table per complex query that would only
> serve
> > the purposes of this query and which would contain only those record
> indexes
> > that should go through so that the complex query can then perform a
> simple
> > INNER JOIN to filter out the results.
> >
> > I've also thought about retrieving the full set of indexes for all
> matched
> > records programmatically and then reinjecting this in the main query as
> a PK
> > IN (...) but that would give really huge index sets that would go back
> and
> > forth between the server and the client so I ruled this one out.
> >
> > Now unless there is another simpler and miraculous solution that I
> haven't
> > thought of, I have to find out which one of these alternatives is the
> best
> > one speed-wise so any help on this is sincerely appreciated.
> >
> > Daniel
> >
> >
> > -- 
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
> >
>


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to