You are on the right track. Don't forget to use the EXPLAIN command to help you tune in your indexes.
One other thing that may help is to try making extracts of some of your tables as temp tables and using them in the final JOIN query that pulls it all together. For instance, you could make a temp table of vardata that only has the IDs for campaign 17 and an extract of vars where name IN ('content','browser','bandwidth')Then build your report from those extracts instead of the full table. When you build JOINS whose cross products approach hundreds of millions of records, you can easily start to see slowdowns. One way around that is to minimize the amount of data you actually JOIN together. If you have 4 tables of 1000 rows each all joined together in a single query, that's a cross product of 1000x1000x1000x1000 or 1.0e12 (1 trillion) possible record combinations that the engine must evaluate before it can finish building the internal JOIN table. If you can just pre-query two of those tables so that you are only joining 20x20x1000x1000 you have just eliminated 600,000,000 (six-hundred billion) row combinations from the evaluation phase. How much faster do you think that will be, eh? You may also want to look at other threads in this list that discuss something called a "crosstab query" or "pivot table". You are actually building one and perhaps the techniques discussed for flipping a single table can help you optimize this query, too. Off the top of my head, here is an attempt to help speed things up: CREATE TEMPORARY TABLE tmpVardata (id, session, nameid, key(id, session), key(nameid)) FROM vardata WHERE campaignID = 17; CREATE TEMPORARY TABLE tmpVars(id, name, value, key(id)) FROM vars WHERE name in ('content','browser','bandwidth'); SELECT vd.id AS id, vd.session AS session, max(if(tv.name='content', tv.value,null)) AS content, max(if(tv.name='browser', tv.value,null)) AS browser, max(if(tv.name='bandwidth',tv.value,null))AS bandwidth FROM tmpVardata vd LEFT JOIN data d on d.session = vd.session LEFT JOIN tmpVars tv ON tv.id = vd.nameid OR tv.id = d.nameid GROUP BY 1,2; DROP TEMPORARY TABLE tmpVardata, tmpVars; (I used a shorthand in the GROUP BY clause to group by column positions instead of names) Can you see what I am trying to do? First I shrink a couple of tables, then I eliminate the multiple joins. It may be close but I doubt I got it just right. HTH, Shawn Green Database Administrator Unimin Corporation - Spruce Pine Mathew Ray <[EMAIL PROTECTED]> wrote on 03/24/2005 12:47:20 PM: > Hi Shawn, > > I think I may have found a solution, but the query takes quite a while > to run...here is what I have now paraphrased: > > > SELECT > vd.id AS id, > vd.session AS session, > vl.value AS content, > vd2.varvalue AS browser, > vl3.value AS bandwidth > > FROM vardata AS vd > > LEFT OUTER JOIN vars AS vn ON ( vn.name = 'content' AND vn.id = vd.nameid ) > LEFT OUTER JOIN valuelookup AS vl ON ( vl.id = vd.valueid ) > > LEFT OUTER JOIN data AS vd2 ON ( vd2.session = vd.session ) > LEFT OUTER JOIN vars AS vn2 ON ( vn2.id = vd2.nameid ) > > /* and so forth for each variable... */ > WHERE vd.campaignId = 17 > AND vn.name = 'content' > AND vn2.name = 'browser' > AND vn3.name = 'bandwidth' > > > This would result in the following 'flattened' structure: > > [id] [session] [content] [browser] [bandwidth] > 22 55 intro MSIE 6.0 hi > 23 55 form1 MSIE 6.0 hi > 23 56 intro Firefox 1.0 lo > > Only problem is that the status of the thread stays in 'statistics' > forever... 10 mins last time I checked before cancelling it... > > Seems like the time is increasing quite drastically as I pile on more > variable comparisons... I am guessing because I am increasing that > number of comparisons that must be done... > > Thanks, > Mathew > > [EMAIL PROTECTED] wrote: > > Mathew Ray <[EMAIL PROTECTED]> wrote on 03/24/2005 10:42:51 AM: > > > > > >>Been searching for a while and can't seem to come up with any good > >>answers to this - I have a normalized structure of about 5 tables that I > > > > > >>need to denormalize into one big representation of the entire structure. > > > > > >> Anyone know of a good tool or resource to 'flatten' my tables easily? > >> > >>I've got a query that removes all foreign key info and creates a column > >>that holds the linked value, but my problem is that I have some columns > >>that change with every row and others that should only be logged once > >>per row (responses to polls for example). > >> > >>Here is a simplified structure: > >>[data] > >> id (int) > >> nameid (int) > >> valueid (int) > >> > >>[vars] > >> id (int) > >> name (varchar 255) > >> > >>[valuelookup] > >> id (int) > >> value (varchar 255) > >> varid (int) > >> > >>Some variables, like 'browsertype', are consistent for every item, > >>others like 'content' change for every row. The 'poll' variable contains > > > > > >>multiple choice responses, but shouldn't be double counted when the > >>table is de-normalized. I looked at some open source E-T-L tools, but > >>they look pretty complicated, and I feel there is a way of doing this > >>with sql somehow. Any idea on how to approach this? > >>-- > >>Thanks, > >>Mathew > >> > > > > > > Show us some concrete examples of how you want your denormalized data to > > look and I am sure we can help. > > > > Shawn Green > > Database Administrator > > Unimin Corporation - Spruce Pine > >