"shaun thornburgh" <[EMAIL PROTECTED]> wrote on 03/17/2005 06:46:22 PM:
> Hi, > > I have ( among others ) three tables in my database: Claims, Expenses and > Mileage. A claim can contain many expense entries and many mileage entries. > I am using the follwing query to show the total expenses and mileage per > claim for a particulare user: > > SELECT C.*, SUM(E.Amount) AS Amount, SUM(M.Mileage) AS Mileage > FROM Claims C > LEFT JOIN Expenses E ON E.Claim_ID = C.Claim_ID > LEFT JOIN Mileage M ON M.Claim_ID = C.Claim_ID > WHERE C.Claimant_ID = '1' > GROUP BY C.Claim_ID > > The problem is if there are two mileage entries and one expense entry the > expense total is doubled (and vice versa), can some explain this to me > please? > > Thanks for your help. > > TABLE DEFINITIONS: > > mysql> desc Claims; > +-----------------+-------------+------+-----+---------+----------------+ > | Field | Type | Null | Key | Default | Extra | > +-----------------+-------------+------+-----+---------+----------------+ > | Claim_ID | int(11) | | PRI | NULL | auto_increment | > | Claimant_ID | int(11) | | | 0 | | > | Description | varchar(50) | | | | | > | Status | varchar(50) | YES | | Open | | > | Submission_Date | datetime | YES | | NULL | | > | Approval_Date | datetime | YES | | NULL | | > | Approver_ID | int(11) | YES | | NULL | | > +-----------------+-------------+------+-----+---------+----------------+ > 7 rows in set (0.00 sec) > > mysql> desc Expenses; > +---------------------+---------------+------+-----+--------- > +----------------+ > | Field | Type | Null | Key | Default | Extra > | > +---------------------+---------------+------+-----+--------- > +----------------+ > | Expense_ID | int(11) | | PRI | NULL | > auto_increment | > | Claim_ID | int(11) | YES | | NULL | > | > | Description | varchar(50) | YES | | NULL | > | > | Expense_Category_ID | int(11) | YES | | NULL | > | > | Insertion_Date | date | YES | | NULL | > | > | Project_ID | int(11) | YES | | NULL | > | > | Amount | decimal(10,2) | YES | | NULL | > | > | Rate_ID | int(11) | YES | | NULL | > | > | Supplier_ID | int(11) | YES | | NULL | > | > | Receipt | varchar(10) | YES | | NULL | > | > | Receipt_Date | varchar(10) | YES | | NULL | > | > | VAT_Receipt | varchar(10) | YES | | NULL | > | > | VAT_Amount | decimal(10,2) | YES | | NULL | > | > +---------------------+---------------+------+-----+--------- > +----------------+ > 13 rows in set (0.00 sec) > > mysql> desc Mileage; > +----------------+--------------+------+-----+------------+----------------+ > | Field | Type | Null | Key | Default | Extra | > +----------------+--------------+------+-----+------------+----------------+ > | Mileage_ID | int(11) | | PRI | NULL | auto_increment | > | Claim_ID | int(11) | | | 0 | | > | Project_ID | int(11) | | | 0 | | > | Insertion_Date | date | | | 0000-00-00 | | > | Description | varchar(255) | | | | | > | Start_Mileage | int(11) | | | 0 | | > | End_Mileage | int(11) | | | 0 | | > | Mileage | int(11) | | | 0 | | > +----------------+--------------+------+-----+------------+----------------+ > 8 rows in set (0.00 sec) > > mysql> > It's happening because you are joining two child tables at once. Maybe a sequence of diagrams will help. This is what happens within the DB engine whenever you make a JOIN. Since you were nice enough to post your table structures (THANK YOU!!!) I will use your tables and problem query as examples. Something simple: SELECT ... FROM Claims C LEFT JOIN Expenses E ON E.Claim_ID = C.Claim_ID; Step one of any query is to identify all of your source data. In this case, we are combining the records of two tables so that it behaves like a single larger table. This internally created, virtual table represents all possible combinations of rows that satisfy the ON conditions of your JOIN clauses. INTERNAL VIRTUAL TABLE A +-----------------------+-------------------------+ | all columns of Claims | all columns of Expenses | +-----------------------+-------------------------+ | Claims row 1 | all null values | +-----------------------+-------------------------+ | Claims row 2 | Expenses row 246 | +-----------------------+-------------------------+ | Claims row 2 | Expenses row 248 | +-----------------------+-------------------------+ | Claims row 2 | Expenses row 15 | +-----------------------+-------------------------+ | Claims row 3 | Expenses row 4 | +-----------------------+-------------------------+ | | | ~ ~ ~ +-----------------------+-------------------------+ You can actually look at this generated table if you say: SELECT * FROM Claims C LEFT JOIN Expenses E ON E.Claim_ID = C.Claim_ID; But I would recommend using a LIMIT clause with any real data to keep your results small. After all you only want an idea of what the data looks like, right? Let's take the same query and JOIN another table to it SELECT ... FROM Claims C LEFT JOIN Expenses E ON E.Claim_ID = C.Claim_ID LEFT JOIN Mileage M ON M.Claim_ID = C.Claim_ID Let's stop a second and look at the query. Joining the first two tables we know will create INTERNAL VIRTUAL TABLE A but where do we assign the rows from the 3rd table? We need to associate each Mileage record with it's appropriate Claims record but how do we associate a Mileage record to an Expense record? There is no rule in our ON clauses to say that Mileage row x matches up with Expenses row y. What happens then is the query engine generates something called a Cartesian product. Each row of Expenses is matched with every row of Mileage so that all possible combinations are represented. However, since there is a rule in our ON clauses that states that their Claim_ID's must match (from mathematics: if a=c and b=c then a=b), we only get those combinations in our INTERNAL VIRTUAL TABLE where that is true. INTERNAL VIRTUAL TABLE B +----------------+------------------+-----------------+ | Claims columns | Expenses columns | Mileage columns | +----------------+------------------+-----------------+ | Claims row 1 | all null values | Mileage row 21 | +----------------+------------------+-----------------+ | Claims row 1 | all null values | Mileage row 22 | +----------------+------------------+-----------------+ | Claims row 1 | all null values | Mileage row 8 | +----------------+------------------+-----------------+ | Claims row 2 | Expenses row 246 | Mileage row 88 | +----------------+------------------+-----------------+ | Claims row 2 | Expenses row 246 | Mileage row 345 | +----------------+------------------+-----------------+ | Claims row 2 | Expenses row 248 | Mileage row 88 | +----------------+------------------+-----------------+ | Claims row 2 | Expenses row 248 | Mileage row 345 | +----------------+------------------+-----------------+ | Claims row 2 | Expenses row 15 | Mileage row 88 | +----------------+------------------+-----------------+ | Claims row 2 | Expenses row 15 | Mileage row 345 | +----------------+------------------+-----------------+ | Claims row 3 | Expenses row 4 | Mileage row 16 | +----------------+------------------+-----------------+ | Claims row 4 | ... | ... ~ ~ ~ +----------------+------------------+-----------------+ You should be able to see this in action with your data using the following query SELECT C.Claim_ID, E.Expense_ID, M.Mileage_ID FROM Claims C LEFT JOIN Expenses E ON E.Claim_ID = C.Claim_ID LEFT JOIN Mileage M ON M.Claim_ID = C.Claim_ID ORDER BY 1,2,3 LIMIT 100; I don't know of any "clean" ways to do the math you want to do in a single query (there _is_ a way I know of but it involves division which may throw your results off.) My advice is to compute your expenses and mileage separately and combine them into a final result. Yes it's more trips to the server (at least 1 extra trip) but I can't think of an accurate way around the issue. Shawn Green Database Administrator Unimin Corporation - Spruce Pine