Re: Querying on subsets of one-to-many and many-to-many

SGreen Tue, 26 Oct 2004 09:46:44 -0700

OK, now that I have a better model to work with, let me try again. (And 
stop being so discouraged! This is hard for everyone until they get the 
hang of it!!!  =8-D )


> Given a query that resembles this "select * from A where (no links) or
> ((links to both 3 and 4) and (no links to 2))"

This may not be the only way to do this but it's what I thought of first

SELECT A.PK_A, A.DATA_A, sum(isnull(L.PK_A)) as nolinks,  sum(if(L.PK_B IN 
(3,4),1,0)) as wanted, sum(if(L.PK_B = 2,1,0)) as rejects
FROM  A
LEFT JOIN L
        on L.PK_A = A.PK_A
GROUP BY 1,2
HAVING nolinks >=0 or (wanted = 2  and rejects=0);

This returns three more columns that you originally wanted but it's still 
a one-trip query.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


"Daniel BODEA" <[EMAIL PROTECTED]> wrote on 10/26/2004 11:55:34 AM:

> Many thanks for the quick answer. I apologize for leaving out a few 
details
> that should have explained the context in greater detail. I'll use less
> theory below to further explain what's going on and then I'll make some
> remarks on your answer.
> 
> Simplified down to the basics of this problem, we have two tables, A and 
B,
> both having the same structure:
> 
> PK_A / PK_B - primary key
> DATA_A / DATA_B - text
> 
> Then we have table L with two columns:
> 
> PK_A - 0..n of primary key for table A
> PK_B - 0..n of primary key for table B
> 
> Both A and B contain 3 rows with PKs 1, 2, 3. Data is not an issue. L
> contains the following mappings:
> 
> PK_A | PK_B
> 1 | 3
> 1 | 4
> 2 | 2
> 2 | 3
> 2 | 4
> 
> Given a query that resembles this "select * from A where (no links) or
> ((links to both 3 and 4) and (no links to 2))", we should be able to 
work
> (order, limit, etc.) on records 1 and 3 from A. The PKs that the WHERE
> clause works on are those of B not A.
> 
> I also thought at first that a sequence of LEFT and INNER joins would 
both
> filter the results correctly and work fast enough for this kind of data 
but
> two days of joining stuff didn't make it happen and frankly I don't 
think
> there is any "one query" solution.
> 
> In your first example you filter on PK_A which is my mistake because the
> selection is actually made on PK_B. I apologize for not making this 
clearer.
> Using PK_B in your example though is the first trap I fell into during 
those
> joining days. Whatever your conditions in the WHERE clause, you only 
apply
> them to one row at a time which means the relevant part of the query is
> translated into this "where PK_A has links to any of PK_B in (3, 4)". 
The
> NOT IN part has no effect since the row 2 | 2 is excluded but the other 
two
> get through.
> 
> Disregarding PK_A in the WHERE clause and using PK_B instead in the 
second
> example, I believe first of all the INNER JOIN filters out all rows that
> don't have any links (NULL rows) and the WHERE conditions abide to the 
same
> rules as the previous example so the results are the same. In any
> circumstance, using IN in this context results in the set of indexes 
being
> ORed not ANDed since IN returns true for any value in the set.
> 
> I am totally open to any suggestions for "one query" solutions using 
joins
> but my trying to find the best way to use a temporary or cache table is 
the
> result of having proven to myself the (disputable of course) 
impossibility
> of this thing using one trip only.
> 
> Daniel
> 
> ----- Original Message ----- 
> From: <[EMAIL PROTECTED]>
> To: "Daniel BODEA" <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Sent: Tuesday, October 26, 2004 3:41 PM
> Subject: Re: Querying on subsets of one-to-many and many-to-many
> 
> 
> > Under most circumstances, even with data tables as large as yours, I 
have
> > found that direct JOIN statements have provided acceptable 
performance. On
> > certain highly normalized databases, where the requirements of the 
query
> > needed more than 4 or 5 tables joined at once, I have also used your
> > "cache" table idea to great success.
> >
> > However I would like to believe that queries on the relationships 
between
> > just these three tables could be coded by using JOINS to provide some 
very
> > fast results. Since you chose not to post your real table structures 
here,
> > I must create these sample queries using your sample tables.
> >
> > select * from REC_A where (no links exist at all) or ((links to REC_B
> > exist
> > for both PKs 22 and 34) and (no links exist to REC_B for either PK in 
56
> > or
> > 12)
> >
> > CREATE TEMPORARY TABLE tmpUnion (primary key(PK_A))
> > SELECT a.*
> > FROM REC_A a
> > LEFT JOIN LNK
> >         on LNK.PK_A = a.PK_A
> > WHERE LNK.PK_A is null;
> >
> > INSERT IGNORE tmpUnion
> > SELECT a.*
> > FROM REC_A a
> > INNER JOIN LNK
> >         on LNK.PK_A = a.PK_A
> > WHERE LNK.PK_A in (22,34)
> >         AND LNK.PK_A NOT IN (12,56);
> >
> > SELECT * from tmpUnion;
> > DROP TEMPORARY TABLE tmpUnion;
> >
> > Whenever you have an OR like that in your WHERE clause, you are in 
essence
> > running two or more queries and combining the results. Since you are 
on a
> 
> > pre-4.0 server, you do not have the UNION operator available to help 
you
> > so I had to simulate one by using a temporary table. Basically I had 
to
> > hand-code what the UNION operator automates for you. I defined the 
Primary
> > Key on the temp table and used the IGNORE option on the INSERT 
statement
> > to prevent duplicate rows (logically you shouldn't have had any but I 
was
> > being thorough)
> >
> > Here is an equivalent query without the temp table
> >
> > SELECT DISTINCT a.*
> > FROM REC_A a
> > LEFT JOIN LNK  m1
> >         ON m1.PK_A = a.PK_A
> > INNER JOIN LNK m2
> >         ON m2.PK_A = a.PK_A
> > WHERE m1.PK_A is null
> >         OR (
> >                 m2.PK_A in (22,34)
> >                 AND m2.PK_A NOT IN (12,56)
> >         );
> >
> >
> > "Daniel BODEA" <[EMAIL PROTECTED]> wrote on 10/26/2004 09:02:56 
AM:
> >
> > > Greetings,
> > >
> > > I have several solutions to a problem which requires temporary or 
cache
> > > tables but the different solutions are time consuming to test for
> > > performance and I was wondering if anyone had more information on 
this
> > than
> > > myself.
> > >
> > > The server is 3.23.40 without possibility for upgrade to 4.x because 
of
> > the
> > > amount of extra work required for testing, deployment, etc.
> > >
> > > The simplified context is that I have several tables for individual
> > records
> > > (REC_A, REC_B, etc.) with primary keys and one table for links 
between
> > > different types of records based on their primary keys (LNK). LNK is 
N
> > to N
> > > so there is no constraint on the number of links a record can have 
to
> > other
> > > records.
> > >
> > > I have to query these relationships in complex ways, one example 
being
> > > "select * from REC_A where (no links exist at all) or ((links to 
REC_B
> > exist
> > > for both PKs 22 and 34) and (no links exist to REC_B for either PK 
in 56
> > or
> > > 12))". The number of records is in the hundreds of thousands and the
> > links
> > > can get past a million.
> > >
> > > Now since this version does not support GROUP_CONCAT and having 
given up
> > > after a couple of days of trying to torture recursive INNER and LEFT
> > joins
> > > to filter out records based on totally incomprehensible ON clauses, 
I
> > figure
> > > I need temporary or cache intermediary tables.
> > >
> > > The solutions that I see would be to have (1) a generic "flag" table
> > that a
> > > complex query would use to first flag the records that should get
> > through
> > > and then join and filter with that table to return the list of 
records
> > but
> > > that would mean locking and updating the table for each complex 
query,
> > (2)
> > > generate an "alternate view" of the LNK table by simulating the
> > GROUP_CONCAT
> > > function to aggregate the records in unique rows that contain the 
full
> > set
> > > of links. The question for (2) is how to store the full link set in 
one
> > row
> > > so that queries be able to make use of indexes. (3) would be a 
variation
> > of
> > > (1) by creating a temporary table per complex query that would only
> > serve
> > > the purposes of this query and which would contain only those record
> > indexes
> > > that should go through so that the complex query can then perform a
> > simple
> > > INNER JOIN to filter out the results.
> > >
> > > I've also thought about retrieving the full set of indexes for all
> > matched
> > > records programmatically and then reinjecting this in the main query 
as
> > a PK
> > > IN (...) but that would give really huge index sets that would go 
back
> > and
> > > forth between the server and the client so I ruled this one out.
> > >
> > > Now unless there is another simpler and miraculous solution that I
> > haven't
> > > thought of, I have to find out which one of these alternatives is 
the
> > best
> > > one speed-wise so any help on this is sincerely appreciated.
> > >
> > > Daniel
> > >
> > >
> > > -- 
> > > MySQL General Mailing List
> > > For list archives: http://lists.mysql.com/mysql
> > > To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
> > >
> >
>

Re: Querying on subsets of one-to-many and many-to-many

Reply via email to