Re: [vox-tech] MySQL question: making stats out of orders
What if you just use RRD tool? http://oss.oetiker.ch/rrdtool/ On Tue, Dec 01, 2009 at 05:32:20PM -0800, Bill Kendrick wrote: > > Ok, so I've got a few queries our website does that > are used to determine which products are 'top sellers' > in the past 90 days' worth of sales. > > Right now, this involves joining the product table > with the orders table, by way of an items-in-the-order table. > > In other words, to figure out what sold the best in the > last 90 days, it's something like: > > SELECT * FROM products > JOIN orderitem ON products.id = orderitem.productid > JOIN orders ON orders.id = orderitem.orderid > WHERE orders.timestamp >= { 90 days ago } > > As time goes on, our records or orders, order items, and > actual products all keep growing. That means this task > gets more and more intensive. > > > Now, we've got some statistics tables set up for other > purposes -- how many times a product was viewed, > as well as when any particular logged-in-users last > visited particular items, which allows us to do the > "people who recently looked at X also recently looked at Y". > > I'm thinking we could do something similar for sales, > to simplify the "top sellers" queries. They'd hit > a single stats table (which could, on a daily basis, > be cleared of any >90-days-ago entries), rather than > having to JOIN two large tables that go back to day 1. > > > My current problem -- no doubt due to lack of brain power > at the end of the day -- is how to take the CURRENT data > from the orders and orderitem tables so that I can do an > initial population of a new stats table. > > Our "views" stats table, for example, is something > along the lines of: > > productid, date, views > > where those first two columns act as a key. (Date is at the > 'day' level of granularity.) > > > And we count a view like so: > > INSERT INTO views (productid, date, views) VALUES (###, NOW(), 1) > ON DUPLICATE KEY UPDATE views=views+1 > > In other words, the first time a product is viewed on any given day, > we insert a new row, otherwise we update its existing row. > > > Anyone feel like flexing their MySQL muscles and provide some > ideas on how to populate a similar table (tracking sales) based > on the orders/orderitem table combo explained above? > > Thanks. Sorry for asking you to do my (home)work for me. > > > -- > -bill! > Sent from my computer > ___ > vox-tech mailing list > vox-tech@lists.lugod.org > http://lists.lugod.org/mailman/listinfo/vox-tech -- Brian Lavender http://www.brie.com/brian/ ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] MySQL question: making stats out of orders
Hrm, trying to figure out a non-proprietary question ask, that would be useful. Ah, how about what is the ratio between: (unique orderitem.productid today) -- orders today Assuming it's high you could make it much easier to deal with by building a table of: ,, Then finding the most popular in the last 90 days, or today, or the "hot" item that raised in the rankings the most today becomes easy. I used this approach for handling a few 100k web hits/day and it allowed for much more complex queries and still got my answers very quickly. We logged directly to Mysql and wanted to handle lots of complex queries of years of logs. We typically got our answers in a few seconds or less. ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] MySQL question: making stats out of orders
On Wed, Dec 02, 2009 at 12:37:19AM -0600, Chanoch (Ken) Bloom wrote: > Then you have to throw in a conversion function that truncates it to the > date in order for the `group by` to work correctly. I'm not sure what > function that is, but you can play around with the functions listed at > http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html In my original follow-up, I did a "FROM_DAYS(TO_DAYS(...))" trick, which would probably work. Thanks! -- -bill! Sent from my computer ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] MySQL question: making stats out of orders
On Tue, 2009-12-01 at 22:33 -0800, Bill Kendrick wrote: > On Wed, Dec 02, 2009 at 12:20:25AM -0600, Chanoch (Ken) Bloom wrote: > > INSERT into orderstats(productid, date, orders) > > SELECT > > orderitem.productid, orders.timestamp, SUM(orderitem.qty) > > FROM orderitem > > JOIN orders ON orders.id = orderitem.orderid > > WHERE orders.timestamp >= { 90 days ago } > > group by orderitem.productid, orders.timestamp > > > > assuming orders.timestamp has one-day granularity > > Timestamp is actually a timestamp (down the second). Then you have to throw in a conversion function that truncates it to the date in order for the `group by` to work correctly. I'm not sure what function that is, but you can play around with the functions listed at http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] MySQL question: making stats out of orders
On Wed, Dec 02, 2009 at 12:20:25AM -0600, Chanoch (Ken) Bloom wrote: > INSERT into orderstats(productid, date, orders) > SELECT > orderitem.productid, orders.timestamp, SUM(orderitem.qty) > FROM orderitem > JOIN orders ON orders.id = orderitem.orderid > WHERE orders.timestamp >= { 90 days ago } > group by orderitem.productid, orders.timestamp > > assuming orders.timestamp has one-day granularity Timestamp is actually a timestamp (down the second). -bill! ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] MySQL question: making stats out of orders
On Tue, 2009-12-01 at 22:14 -0800, Bill Kendrick wrote: > > It would seem pretty simple to create the analagous table for orders: > > > > orderstats consisting of productid, date, orders > > > > and every time you complete an order, you run the query: > > > > insert into orderstats(productid, date, orders) VALUES (###, NOW(), > > QTY) on duplicate key update orders=orders+QTY > > Yes, that's great for today-on, but the moment I switch to using this > new way of recording purchases, there won't be any data. > > That is, unless I start collecting data for the next 90 days BEFORE > changing how the "top sellers" are determined. > > What I'm looking for is a way to populate a new 'order stats' table > based on the [last 90 days', or whatever, of] existing orders. The one time command to fill the orderstats table for the first time would be INSERT into orderstats(productid, date, orders) SELECT orderitem.productid, orders.timestamp, SUM(orderitem.qty) FROM orderitem JOIN orders ON orders.id = orderitem.orderid WHERE orders.timestamp >= { 90 days ago } group by orderitem.productid, orders.timestamp assuming orders.timestamp has one-day granularity T ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] MySQL question: making stats out of orders
On Tue, Dec 01, 2009 at 08:04:31PM -0600, Chanoch (Ken) Bloom wrote: > Is this task getting more intensive? I think so, but it's hard to tell, with other things going on at the same time. > If is (and the number of > transactions per day hasn't grown significantly), Site traffic has been increasing steadily. > then you probably don't have indices set up correctly. Highly likely. Most of these tables were set up prior to my taking over maintenance of all this, and any similar ones were no doubt based on the ones the previous developer designed. > I think that having an an index on orders.timestamp, > and an index on orderitem.orderid (in addition to the primary keys > products.id and orders.id) should optimize this query, Ok, I'll look into that. Thanks! > but it always > pays to determine what's going on with EXPLAIN SELECT. Good idea. > If that still doesn't speed things up enough, maybe consider using a > table with the MERGE storage engine to move old entries from the > orders table and the orderitem table into archives. Then only query > the table full of recent order data for queries, and only query the > other tables in the merge when you need to. Products are virtual, so a buyer's access to them are determined by the orders, so I'm not going to change how all that logic works. :) > > Now, we've got some statistics tables set up for other > > purposes -- how many times a product was viewed, > > as well as when any particular logged-in-users last > > visited particular items, which allows us to do the > > "people who recently looked at X also recently looked at Y". > > > > I'm thinking we could do something similar for sales, > I'm not really clear on what you're looking for here. We need a better > idea of what stats you're running to know how to structure the table. > > It would seem pretty simple to create the analagous table for orders: > > orderstats consisting of productid, date, orders > > and every time you complete an order, you run the query: > > insert into orderstats(productid, date, orders) VALUES (###, NOW(), > QTY) on duplicate key update orders=orders+QTY Yes, that's great for today-on, but the moment I switch to using this new way of recording purchases, there won't be any data. That is, unless I start collecting data for the next 90 days BEFORE changing how the "top sellers" are determined. What I'm looking for is a way to populate a new 'order stats' table based on the [last 90 days', or whatever, of] existing orders. Thanks! -bill! ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] MySQL question: making stats out of orders
On Tue, Dec 01, 2009 at 05:32:20PM -0800, Bill Kendrick wrote: > > Ok, so I've got a few queries our website does that > are used to determine which products are 'top sellers' > in the past 90 days' worth of sales. > > Right now, this involves joining the product table > with the orders table, by way of an items-in-the-order table. > > In other words, to figure out what sold the best in the > last 90 days, it's something like: > > SELECT * FROM products > JOIN orderitem ON products.id = orderitem.productid > JOIN orders ON orders.id = orderitem.orderid > WHERE orders.timestamp >= { 90 days ago } > > As time goes on, our records or orders, order items, and > actual products all keep growing. That means this task > gets more and more intensive. Is this task getting more intensive? If is (and the number of transactions per day hasn't grown significantly), then you probably don't have indices set up correctly. I think that having an an index on orders.timestamp, and an index on orderitem.orderid (in addition to the primary keys products.id and orders.id) should optimize this query, but it always pays to determine what's going on with EXPLAIN SELECT. If that still doesn't speed things up enough, maybe consider using a table with the MERGE storage engine to move old entries from the orders table and the orderitem table into archives. Then only query the table full of recent order data for queries, and only query the other tables in the merge when you need to. > Now, we've got some statistics tables set up for other > purposes -- how many times a product was viewed, > as well as when any particular logged-in-users last > visited particular items, which allows us to do the > "people who recently looked at X also recently looked at Y". > > I'm thinking we could do something similar for sales, > to simplify the "top sellers" queries. They'd hit > a single stats table (which could, on a daily basis, > be cleared of any >90-days-ago entries), rather than > having to JOIN two large tables that go back to day 1. > > > My current problem -- no doubt due to lack of brain power > at the end of the day -- is how to take the CURRENT data > from the orders and orderitem tables so that I can do an > initial population of a new stats table. > > Our "views" stats table, for example, is something > along the lines of: > > productid, date, views > > where those first two columns act as a key. (Date is at the > 'day' level of granularity.) > > > And we count a view like so: > > INSERT INTO views (productid, date, views) VALUES (###, NOW(), 1) > ON DUPLICATE KEY UPDATE views=views+1 > > In other words, the first time a product is viewed on any given day, > we insert a new row, otherwise we update its existing row. > > > Anyone feel like flexing their MySQL muscles and provide some > ideas on how to populate a similar table (tracking sales) based > on the orders/orderitem table combo explained above? I'm not really clear on what you're looking for here. We need a better idea of what stats you're running to know how to structure the table. It would seem pretty simple to create the analagous table for orders: orderstats consisting of productid, date, orders and every time you complete an order, you run the query: insert into orderstats(productid, date, orders) VALUES (###, NOW(), QTY) on duplicate key update orders=orders+QTY -- Chanoch (Ken) Bloom. PhD candidate. Linguistic Cognition Laboratory. Department of Computer Science. Illinois Institute of Technology. http://www.iit.edu/~kbloom1/ ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
Re: [vox-tech] MySQL question: making stats out of orders
On Tue, Dec 01, 2009 at 05:32:20PM -0800, Bill Kendrick wrote: > Anyone feel like flexing their MySQL muscles and provide some > ideas on how to populate a similar table (tracking sales) based > on the orders/orderitem table combo explained above? > > Thanks. Sorry for asking you to do my (home)work for me. Maybe as simple as something like: SELECT orderitem.productid, SUM(orderitem.qty), FROM_DAYS(TO_DAYS(orders.ordertime)) AS day FROM orders JOIN orderitem ON orders.id = orderitem.orderid GROUP BY productid,day ??? -bill! ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech
[vox-tech] MySQL question: making stats out of orders
Ok, so I've got a few queries our website does that are used to determine which products are 'top sellers' in the past 90 days' worth of sales. Right now, this involves joining the product table with the orders table, by way of an items-in-the-order table. In other words, to figure out what sold the best in the last 90 days, it's something like: SELECT * FROM products JOIN orderitem ON products.id = orderitem.productid JOIN orders ON orders.id = orderitem.orderid WHERE orders.timestamp >= { 90 days ago } As time goes on, our records or orders, order items, and actual products all keep growing. That means this task gets more and more intensive. Now, we've got some statistics tables set up for other purposes -- how many times a product was viewed, as well as when any particular logged-in-users last visited particular items, which allows us to do the "people who recently looked at X also recently looked at Y". I'm thinking we could do something similar for sales, to simplify the "top sellers" queries. They'd hit a single stats table (which could, on a daily basis, be cleared of any >90-days-ago entries), rather than having to JOIN two large tables that go back to day 1. My current problem -- no doubt due to lack of brain power at the end of the day -- is how to take the CURRENT data from the orders and orderitem tables so that I can do an initial population of a new stats table. Our "views" stats table, for example, is something along the lines of: productid, date, views where those first two columns act as a key. (Date is at the 'day' level of granularity.) And we count a view like so: INSERT INTO views (productid, date, views) VALUES (###, NOW(), 1) ON DUPLICATE KEY UPDATE views=views+1 In other words, the first time a product is viewed on any given day, we insert a new row, otherwise we update its existing row. Anyone feel like flexing their MySQL muscles and provide some ideas on how to populate a similar table (tracking sales) based on the orders/orderitem table combo explained above? Thanks. Sorry for asking you to do my (home)work for me. -- -bill! Sent from my computer ___ vox-tech mailing list vox-tech@lists.lugod.org http://lists.lugod.org/mailman/listinfo/vox-tech