James Black <[EMAIL PROTECTED]> wrote on 04/27/2005 10:36:20 AM: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I am curious if there is a simple way to do this. I was just asked to > give some data that requires me to group by day of week and also group > by hour in day (two different pages). > > Thanx for any help. > > - -- > "Love is mutual self-giving that ends in self-recovery." Fulton Sheen > James Black [EMAIL PROTECTED] > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.1 (MingW32) > Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > > iD8DBQFCb6NkikQgpVn8xrARAhEpAJwK1Yp9gBn0bnfUJ07dbf2bVHpSiACfd8Wi > HycjEYTbpk0NAPEtEgV5BpY= > =3+gk > -----END PGP SIGNATURE----- > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] >
The simple answer (not good for frequent analysis) is to use the date and time functions to extract what you need from your data and group on those values (http://dev.mysql.com/doc/mysql/en/date-and-time-functions.html): SELECT DAYOFWEEK(name_of_datetime_column), count(1) FROM tablename GROUP BY 1; (that is a mysql shorthand way of saying: "group on the values in the first column") If you are going to need to do this time-based analysis frequently on larger datasets, it will save you lots of time to precompute certain values and add them as additional columns to the table. This is better and may be sufficient for your needs but this isn't the best-performing solution. For instance, if you wanted to evaluate web traffic by month, week, day of month, day of year, day of the week, and hour then adding those columns to your data table will seriously speed up several common types of queries: SELECT dayofyear, hour, sum(bytes_transferred) FROM data_table WHERE year = 2005 GROUP BY dayofyear, hour; This will give you the data you need to plot your hourly traffic (in bytes) for every day this year or you could say SELECT month, dayofmonth, hour, sum(bytes_transferred) FROM data_table WHERE year = 2005 GROUP BY month, dayofmonth, hour You get the same information but it's broken down by month, too. However, you still have to scan every row of the table (unless you allocate some serious index space) and it's faster but not the fastest way of making this kind of analysis. This general type of dimensional analysis is what the theories and practices of OLAP covers (online analytical processing). Generally, to get faster results for statistical research over values or ranges of values, you precompute some of your source data's statistics and associate those values with various dimensions (in this example we are dealing with sum(bytes_transferred) over periods of time). One way to do this is with a table that looks like: CREATE TABLE olapBytesTransferred ( year int, month int, week int, dayofyear int, dayofmonth int, dayofweek int, hour int, countBytes int, sumBytes float, avgBytes float, stdBytes float ) and populate it periodically (say once a day or once an hour) with the numbers for the raw data you have collected since the last time you checked. Then when it comes time to slice-and-dice your statistics you don't have to go through the hundreds of millions of rows of raw log data because you already have it broken down and added up by the hour. To collect statistics by the day, you only need to add up 24 hour's-worth of data for each day. This is much faster than trying to slog through and sort and analyze 100000 events per day from the raw logs. If you need finer control, add more time division columns (our dimensions) to your table. Indexing these little monsters can be tricky because you need to balance the number of indexes (more index slow you down during INSERTs), index size (each index takes up space on disk and indexes that fit into memory are faster than those that require swap space to load), and performance desired (more indexes can mean faster SELECTs). You could even create breakdowns based on users per day (sticking with the web log example) or IP addresses per hour or .... you see where this is going. By extracting and extrapolating information from your raw logs and creating a table structure that is DESIGNED to support reports and statistical analysis, you can make that kind of information much easier to get to. Yes, it's much faster but it's not as simple. Shawn Green Database Administrator Unimin Corporation - Spruce Pine )