[EMAIL PROTECTED] wrote:
Hi!
I need to use SQL to count some sequences. We have taken a short snapshot of 1 year for people registered in programs. So we have data in a table like so:
ID m1 m2 m3 m4 m5 m6 m7m8 m9 m10 m11 m12
The "m1", "m2", "m3" refers to month 1, month2, month3, etc.
The data for 1 person might look like so:
0023 1 1 1 0 0 2 2 1 1 0 1 0 and there is 1 row for each person.
The "1" means that they were in a program called "SW" and the "2" means "Other" and the 0 means "Not in any program".
My problem is to count durations for each person, by program, and get average stays in program 1 and in program 2.
This is unclear. Given the example row you provide, is the answer for person 0023 a stay of 6 months in program SW (1), 2 months in other programs (2), and 4 months in no program (0), or is it 3 stays in program SW (1) with average length of 2 months ([3 + 2 + 1]/3) and 1 stay in other programs (2) with average length 2 months? Do you then want an average stay in SW and an average stay in other programs, taken over all people?
Is there a nice way to do this with SQL?
Yes, I think it can be done, but nice? -- eye of the beholder, I suppose. Something like
SELECT AVG( IF(m1 = 1, 1, 0) + IF(m2 = 1, 1, 0) + IF(m3 = 1, 1, 0) + IF(m4 = 1, 1, 0) + IF(m5 = 1, 1, 0) + IF(m6 = 1, 1, 0) + IF(m7 = 1, 1, 0) + IF(m8 = 1, 1, 0) + IF(m9 = 1, 1, 0) + IF(m10 = 1, 1, 0) + IF(m11 = 1, 1, 0) + IF(m12 = 1, 1, 0) ) AS SW,
AVG( IF(m1 = 2, 1, 0) + IF(m2 = 2, 1, 0) + IF(m3 = 2, 1, 0) + IF(m4 = 2, 1, 0) + IF(m5 = 2, 1, 0) + IF(m6 = 2, 1, 0) + IF(m7 = 2, 1, 0) + IF(m8 = 2, 1, 0) + IF(m9 = 2, 1, 0) + IF(m10 = 2, 1, 0) + IF(m11 = 2, 1, 0) + IF(m12 = 2, 1, 0) ) AS Other,
AVG( IF(m1 = 0, 1, 0) + IF(m2 = 0, 1, 0) + IF(m3 = 0, 1, 0) + IF(m4 = 0, 1, 0) + IF(m5 = 0, 1, 0) + IF(m6 = 0, 1, 0) + IF(m7 = 0, 1, 0) + IF(m8 = 0, 1, 0) + IF(m9 = 0, 1, 0) + IF(m10 = 0, 1, 0) + IF(m11 = 0, 1, 0) + IF(m12 = 0, 1, 0) ) AS 'No program' FROM program_months;
should get you started. The IFs check each month column for a program match and add up the matches. Thus, we get the number of months in a program for each person (row). Then we take an average of those numbers. This average includes people who were never in the program. Did you want to exclude them?
I'd guess from your earlier message that you have more programs to track than 'SW' and 'other', and the numbers in the month columns are actually the ids of the corresponding rows in the programs table. If so, you could change the above to something like
SELECT p.id, p.name, AVG( IF(pm.m1 = p.id, 1, 0) + IF(pm.m2 = p.id, 1, 0) + IF(pm.m3 = p.id, 1, 0) + IF(pm.m4 = p.id, 1, 0) + IF(pm.m5 = p.id, 1, 0) + IF(pm.m6 = p.id, 1, 0) + IF(pm.m7 = p.id, 1, 0) + IF(pm.m8 = p.id, 1, 0) + IF(pm.m9 = p.id, 1, 0) + IF(pm.m10 = p.id, 1, 0) + IF(pm.m11 = p.id, 1, 0) + IF(pm.m12 = p.id, 1, 0) ) AS 'Average months' FROM programs p, program_months pm;
Notice there's no join condition, so this is a Cartesian product. That's intentional. For each program id, we need to look at each and every row of the program_months table for matching ids in the 12 month columns. As that means processing N*M rows (where N is the number of rows in table programs and M is the number of rows in table program_months), this could be huge. Maybe too huge. If so, you could limit the processing to a particular program or programs by adding a WHERE clause restricting the programs to consider. Something like
WHERE p.name = 'SW'
Thanks heaps!
-Alex
By the way, I could easily be wrong, but your description gives the impression that you've put the cart before the horse. That is, you've designed (and filled) the table first, then turned to the problem of how to use it to answer your questions. If at all possible, you should list the questions first, then design the tables to facilitate answering those questions. Of course, if the questions are new or the tables inherited...
Michael
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]