Brian,>Let's say a member completes action 'abc'. We want to query the >member_actions table for all members that have also completed action 'abc'. >We then want to determine what the top 3 other actions are that were >completed by members who have completed action 'abc', while making sure that >only actions that have not already been completed by this member are >considered.Ordered groupwise quotas. For a subquery-free two-table example see http://www.artfulsoftware.com/queries.php#18. PB ----- Brian Erickson wrote: We are looking for some help with queries that will accomplish a similar feature to what Amazon does. When you purchase a product, Amazon looks at all other people who have purchased that product, and then looks at all of the OTHER products those people have purchased, and uses that data to suggest related products to you. That's essentially what we are trying to do.We have 3 tables: members, actions, and member_actions. The 'members' table tracks all of our customers, the 'actions' table tracks all of the different actions each member can complete, and the 'member_actions' table is the weak entity link that tracks which actions each member have completed. Let's say a member completes action 'abc'. We want to query the member_actions table for all members that have also completed action 'abc'. We then want to determine what the top 3 other actions are that were completed by members who have completed action 'abc', while making sure that only actions that have not already been completed by this member are considered. We are using MySQL version 3.23. There are approximately 500 unique rows in the 'actions' table and 2,000,000 rows in the member_actions table, with 3,000+ actions being recorded at any given time. Is it possible to achieve this functionality with one/few queries? The statistics above may be important because if a query takes too long to execute, the server may not be physically capable of executing that query 3,000+ times simultaneously. Another option we have considered is to create a separate table called 'correlation' with two fields: action and correlated_action. We would then populate this table in a batch process following the pseudocode below. SELECT DISTINCT(action) FROM member_actions Loop SELECT DISTINCT(member) FROM member_actions WHERE action = x Loop SELECT DISTINCT(action) FROM member_actions WHERE member = y AND action <> x Loop INSERT INTO correlation (action, correlated_action) VALUES (x, z) Then we could easily query this table to find correlated actions like so: SELECT DISTINCT(correlated_action), COUNT(*) AS count FROM correlation WHERE action = x GROUP BY correlated_action ORDER BY count DESC This would not solve the issue of only returning actions that the member had not already completed, but that could probably be accomplished by simply joining the correlation table back to the member_actions table. So, our question is whether or not this is feasible with a one/few query approach, or if this is something that should be accomplished with something similar to the approach above? Can anyone provide a good start for us? |
No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.385 / Virus Database: 268.3.2/294 - Release Date: 3/27/2006
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]