Re: Help with a JOIN query please

SGreen Fri, 18 Mar 2005 07:03:57 -0800

"shaun thornburgh" <[EMAIL PROTECTED]> wrote on 03/17/2005 
06:46:22 PM:


> Hi,
> 
> I have ( among others ) three tables in my database: Claims, Expenses 
and 
> Mileage. A claim can contain many expense entries and many mileage 
entries. 
> I am using the follwing query to show the total expenses and mileage per 

> claim for a particulare user:
> 
> SELECT C.*, SUM(E.Amount) AS Amount, SUM(M.Mileage) AS Mileage
> FROM Claims C
> LEFT JOIN Expenses E ON E.Claim_ID = C.Claim_ID
> LEFT JOIN Mileage M ON M.Claim_ID = C.Claim_ID
> WHERE C.Claimant_ID = '1'
> GROUP BY C.Claim_ID
> 
> The problem is if there are two mileage entries and one expense entry 
the 
> expense total is doubled (and vice versa), can some explain this to me 
> please?
> 
> Thanks for your help.
> 
> TABLE DEFINITIONS:
> 
> mysql> desc Claims;
> 
+-----------------+-------------+------+-----+---------+----------------+
> | Field           | Type        | Null | Key | Default | Extra |
> 
+-----------------+-------------+------+-----+---------+----------------+
> | Claim_ID        | int(11)     |      | PRI | NULL    | auto_increment 
|
> | Claimant_ID     | int(11)     |      |     | 0       | |
> | Description     | varchar(50) |      |     |         | |
> | Status          | varchar(50) | YES  |     | Open    | |
> | Submission_Date | datetime    | YES  |     | NULL    | |
> | Approval_Date   | datetime    | YES  |     | NULL    | |
> | Approver_ID     | int(11)     | YES  |     | NULL    | |
> 
+-----------------+-------------+------+-----+---------+----------------+
> 7 rows in set (0.00 sec)
> 
> mysql> desc Expenses;
> +---------------------+---------------+------+-----+---------
> +----------------+
> | Field               | Type          | Null | Key | Default | Extra  
>   |
> +---------------------+---------------+------+-----+---------
> +----------------+
> | Expense_ID          | int(11)       |      | PRI | NULL    | 
> auto_increment |
> | Claim_ID            | int(11)       | YES  |     | NULL    |  
>   |
> | Description         | varchar(50)   | YES  |     | NULL    |  
>   |
> | Expense_Category_ID | int(11)       | YES  |     | NULL    |  
>   |
> | Insertion_Date      | date          | YES  |     | NULL    |  
>   |
> | Project_ID          | int(11)       | YES  |     | NULL    |  
>   |
> | Amount              | decimal(10,2) | YES  |     | NULL    |  
>   |
> | Rate_ID             | int(11)       | YES  |     | NULL    |  
>   |
> | Supplier_ID         | int(11)       | YES  |     | NULL    |  
>   |
> | Receipt             | varchar(10)   | YES  |     | NULL    |  
>   |
> | Receipt_Date        | varchar(10)   | YES  |     | NULL    |  
>   |
> | VAT_Receipt         | varchar(10)   | YES  |     | NULL    |  
>   |
> | VAT_Amount          | decimal(10,2) | YES  |     | NULL    |  
>   |
> +---------------------+---------------+------+-----+---------
> +----------------+
> 13 rows in set (0.00 sec)
> 
> mysql> desc Mileage;
> 
+----------------+--------------+------+-----+------------+----------------+
> | Field          | Type         | Null | Key | Default    | Extra   |
> 
+----------------+--------------+------+-----+------------+----------------+
> | Mileage_ID     | int(11)      |      | PRI | NULL       | 
auto_increment |
> | Claim_ID       | int(11)      |      |     | 0          |   |
> | Project_ID     | int(11)      |      |     | 0          |   |
> | Insertion_Date | date         |      |     | 0000-00-00 |   |
> | Description    | varchar(255) |      |     |            |   |
> | Start_Mileage  | int(11)      |      |     | 0          |   |
> | End_Mileage    | int(11)      |      |     | 0          |   |
> | Mileage        | int(11)      |      |     | 0          |   |
> 
+----------------+--------------+------+-----+------------+----------------+
> 8 rows in set (0.00 sec)
> 
> mysql>
> 

It's happening because you are joining two child tables at once. Maybe a 
sequence of diagrams will help. This is what happens within the DB engine 
whenever you make a JOIN. Since you were nice enough to post your table 
structures (THANK YOU!!!) I will use your tables and problem query as 
examples. 

Something simple:
SELECT ...
FROM Claims C
LEFT JOIN Expenses E ON E.Claim_ID = C.Claim_ID;

Step one of any query is to identify all of your source data. In this 
case, we are combining the records of two tables so that it behaves like a 
single larger table. This internally created, virtual table represents all 
possible combinations of rows that satisfy the ON conditions of your JOIN 
clauses.

INTERNAL VIRTUAL TABLE A
+-----------------------+-------------------------+
| all columns of Claims | all columns of Expenses |
+-----------------------+-------------------------+
| Claims row 1          | all null values         |
+-----------------------+-------------------------+
| Claims row 2          | Expenses row 246        |
+-----------------------+-------------------------+
| Claims row 2          | Expenses row 248        |
+-----------------------+-------------------------+
| Claims row 2          | Expenses row 15         |
+-----------------------+-------------------------+
| Claims row 3          | Expenses row 4          |
+-----------------------+-------------------------+
|                       |                         |
~                       ~                         ~
+-----------------------+-------------------------+

You can actually look at this generated table if you say:

SELECT *
FROM Claims C
LEFT JOIN Expenses E ON E.Claim_ID = C.Claim_ID;

But I would recommend using a LIMIT clause with any real data to keep your 
results small. After all you only want an idea of what the data looks 
like, right?

Let's take the same query and JOIN another table to it

SELECT ...
FROM Claims C
LEFT JOIN Expenses E ON E.Claim_ID = C.Claim_ID
LEFT JOIN Mileage M ON M.Claim_ID = C.Claim_ID

Let's stop a second and look at the query. Joining the first two tables we 
know will create INTERNAL VIRTUAL TABLE A but where do we assign the rows 
from the 3rd table?  We need to associate each Mileage record with it's 
appropriate Claims record but how do we associate a Mileage record to an 
Expense record?  There is no rule in our ON clauses to say that Mileage 
row x matches up with Expenses row y. What happens then is the query 
engine generates something called a Cartesian product. Each row of 
Expenses is matched with every row of Mileage so that all possible 
combinations are represented. However, since there is a rule in our ON 
clauses that states that their Claim_ID's must match (from mathematics: if 
a=c and b=c then a=b), we only get those combinations in our INTERNAL 
VIRTUAL TABLE where that is true. 

INTERNAL VIRTUAL TABLE B
+----------------+------------------+-----------------+
| Claims columns | Expenses columns | Mileage columns |
+----------------+------------------+-----------------+
| Claims row 1   | all null values  | Mileage row 21  |
+----------------+------------------+-----------------+
| Claims row 1   | all null values  | Mileage row 22  |
+----------------+------------------+-----------------+
| Claims row 1   | all null values  | Mileage row 8   |
+----------------+------------------+-----------------+
| Claims row 2   | Expenses row 246 | Mileage row 88  |
+----------------+------------------+-----------------+
| Claims row 2   | Expenses row 246 | Mileage row 345 |
+----------------+------------------+-----------------+
| Claims row 2   | Expenses row 248 | Mileage row 88  |
+----------------+------------------+-----------------+
| Claims row 2   | Expenses row 248 | Mileage row 345 |
+----------------+------------------+-----------------+
| Claims row 2   | Expenses row 15  | Mileage row 88  |
+----------------+------------------+-----------------+
| Claims row 2   | Expenses row 15  | Mileage row 345 |
+----------------+------------------+-----------------+
| Claims row 3   | Expenses row 4   | Mileage row 16  |
+----------------+------------------+-----------------+
| Claims row 4   | ...              | ...
~                ~                  ~
+----------------+------------------+-----------------+


You should be able to see this in action with your data using the 
following query

SELECT C.Claim_ID, E.Expense_ID, M.Mileage_ID
FROM Claims C
LEFT JOIN Expenses E ON E.Claim_ID = C.Claim_ID
LEFT JOIN Mileage M ON M.Claim_ID = C.Claim_ID
ORDER BY 1,2,3
LIMIT 100;

I don't know of any "clean" ways to do the math you want to do in a single 
query (there _is_ a way I know of but it involves division which may throw 
your results off.) My advice is to compute your expenses and mileage 
separately and combine them into a final result. Yes it's more trips to 
the server (at least 1 extra trip) but I can't think of an accurate way 
around the issue.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Re: Help with a JOIN query please

Reply via email to