Jonathan Mangin wrote:
I have a simple query, (part of a union).
calendar is a table of only dates, `date` as PK.
sales has a multi-col index on (date, uid).

(select calendar.date as date,
       time_format(time(sales.date),'%H:%i') as time,
    from calendar
    left join sales
       on date(sales.date) = calendar.date
       and sales.uid = 'george'
    where calendar.date between '2005-08-01' and '2005-08-14')

First, I don't quite understand joins(!) on constants (sales.uid = 'george').
Is this proper, or should uid be included in where clause?

This query makes perfect sense. For each day in the first two weeks of August (matching rows in calendar), you want a report of sales to/by george. The following simpler query does most of that job:

  SELECT calendar.date AS date,
         time_format(time(sales.date),'%H:%i') as time,
  FROM calendar
  JOIN sales ON date(sales.date) = calendar.date
  WHERE calendar.date BETWEEN '2005-08-01' AND '2005-08-14'
    AND sales.uid = 'george'

but it doesn't report on days where george has no sales. You fix that by changing to a LEFT JOIN, which gives one row of output for each calendar day that has no match in sales with NULLs in the sales columns. The key here is that you must move the sales.uid = 'george' to the ON clause, because it is part of the requirement for determing matching rows in sales. If you leave it in the WHERE clause, it will undo the effect of the LEFT JOIN, because the WHERE clause conditions on the table on the right, sales, happen after matching rows are found or NULL rows are made. You see? If george has no sales on a particular calendar day, you get a column with NULLs on the sales side if both conditions are in the ON clause, but if the 'george' requirement is in the WHERE clause, days with no 'george' sales will be filtered out. Hence the query you've given above.

Second, explain shows no possible keys for sales.

id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY calendar range PRIMARY PRIMARY 3 NULL 11 Using where; Using index
1       PRIMARY sales   ALL     NULL    NULL    NULL    NULL    23

Some versions of my query showed possible_keys, but I can't seem to
find what I did and I'm frustrated.  What am I doing wrong here?

The problem is this "date(sales.date) = calendar.date". You've run sales.date through a functon, so the index on sales.date cannot be used. To use the index, you must rewrite the query so the actual column value is used. An index on sales.uid could have been used, but you don't have one (sales.uid is part of an index, but that part can't be used if the leftmost ppart of the index, sales.date, is not being used).

You haven't told us what kind of column sales.date is, so it's hard to say what to do. Ideally, if you are going to be using it to join to calendar, it should be a DATE. In that case, you can simply do

  SELECT calendar.date AS date,
         time_format(time(sales.date),'%H:%i') as time,
  FROM calendar
  LEFT JOIN sales ON sales.date = calendar.date
                 AND sales.uid = 'george'
  WHERE calendar.date BETWEEN '2005-08-01' AND '2005-08-14'

I'm guessing that sales.date is not a DATE, though. If it's a DATETIME or a TIMESTAMP, one possibility would be to replace it with two columns, a DATE and a TIME. That would help this query, but may not work well with the rest of your design. You'll have to decide.

If it doesn't make sense to change sales.date, there are two things you can do that should help.

1) Add an index on sales.uid. I think the best way to do that would be to replace your multi-column index on (date, uid) with one on (uid, date). That would help this query, and would help any query which selects all dates for a given uid. I expect you also want reports for all sales in a given date range for all uids. In that case, you would also want a single-column index on date. So, I'm suggesting something like

  ALTER TABLE sales
  DROP INDEX name_of_the_index_on_date_and_uid,
  ADD INDEX (uid, date),
  ADD INDEX (date);

2) Add a condition to your query to help mysql use the index on sales.date. The trick is to add it to the ON clause, because the WHERE clause is too late to pre-select rows on the right. So

  SELECT calendar.date AS date,
         time_format(time(sales.date),'%H:%i') as time,
  FROM calendar
  LEFT JOIN sales ON sales.date = calendar.date
                 AND sales.uid = 'george'
                 AND sales.date BETWEEN '2005-08-01' AND '2005-08-14 23:59:59'
  WHERE calendar.date BETWEEN '2005-08-01' and '2005-08-14'

This would work best with a multi-column index on (uid, date). Because the sales.date restriction is a range, it would only use your current multi-column index on (date, uid) as if it were a single-column index on date.

Thanks.

Michael


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to