i don't think i missed any points raised by anyone in this discussion. email is often a difficult medium for technical issues.

for most of our purposes, we run multiple queries in order to make sure we are actually getting the data we want. it makes the code simpler, easier to understand and less error prone. this was one example where conceptually we thought it should be able to be done -- and it can, if you use a nested select.

thanks again to all for helped. we're on to bigger and better things...

jeff


Michael Stassen wrote:

Jeff Mathis wrote:

well, obviously some, if not all, of what you are saying is true. your


I don't believe I said anything untrue.  Did you have something in mind?

table example below though is not the same as mine. My table stores time series data. for every symbol, there are a series of rows all with different dates. there is a unique constraint on the combination of symbol and close_date. so, for every symbol, there is one and only one maximum date. ...


You've missed my point. My example table wasn't meant to be equivalent to yours. Rather, it was a simple example to illustrate the idea that the MAX() function does not look for the row with the largest value. Instead, it tells you what the largest value is with no reference to row. That is, it is an aggregate function for use with GROUP BY. In other words, it is a summary statistic, not a data point.

Now, you may know that in your particular situation there will be only one row for each group's max date, but the MAX() function doesn't know that. Even if it did (because your dates are unique), it still wouldn't make sense for MAX() to think in terms of finding a row, because it is perfectly reasonable to ask for the MIN() in the same query.

Look at my example again. I asked for the max, the min, and the average. Even if we change that to use your table rather than mine, which row should be pulled? The one with the max date, the one with the min date, or the one with the average date (which probably doesn't even exist)?

... i want that row and the name field it contains.


Right, I got that.

your example using the subquery works. when we used the subquery approach, we forgot to include the equivalent of t1.symbol = t2.symbol.


I'm glad it worked.  I was confident that it would.

if we use:

select max(close_date), symbol, name from TD
where symbol in (<quoted char string>) group by symbol,
name order by symbol;

we end up getting multiple rows for each symbol if the names change over time. but that's ok for now -- we can parse the query output within our application and get the one row with the most recent date.


Right. Adding name to the GROUP BY makes it legitimate to have name in the list of columns to select, but this query doesn't do what you want. It gives the maximum close_date for each symbol-name combination. As you say, you can parse the results to find the max close_date for each symbol subsection, but why would you do that when you already have a query which gives exactly the result you want?

what we want to get is conceptually simple, but perhaps not so in terms of SQL.


It's easy to say, but describe how to do it: For a given symbol, you have to look at all the close_date values to find the max, then you have to find the row with that value; or sort by close_date, then take the row at the high end of the sorted list; or compare the rows two at a time, storing the rest of the row for the winner of each comparison. They all amount to the same thing: a 2 step process. Those two steps are accomplished by the 2 queries in the temp table method, or by the query + subquery.

Doing this in SQL, however, is tricky enough that yours is a frequently asked question.

jeff


Michael

Michael Stassen wrote:

No, Shawn's answer is correct. You are starting from a false assumption. You are expecting that MAX(closedate) corresponds to a row. It does not. Consider the table

Table=stuff:
  sym val  note
  --- ---  ----
  AAA   2   one
  AAA   2   two
  AAA   4   three
  AAA   6   four
  AAA  12   five
  AAA   7   six
  AAA  12   seven
  BBB   1   eight
  BBB   2   nine
  BBB   3   ten

Now consider the query

  SELECT sym, note, MAX(val), MIN(val), AVG(val)
  FROM stuff
  GROUP BY sym;

Which row should be returned for sym='AAA'? Do you see the problem? It is clear that, for sym=AAA, MIN(val) is 2, MAX(val) is 12, and AVG(val) is about 6.43. Which row is that? The answer is that it's not a row. MAX(), MIN(), and AVG() are aggregate functions. They do not return rows, they return summary stats about a set of rows.

Many dbs wouldn't even allow a query like that, because we are asking for a column not included in the GROUP BY. Mysql allows that, but the manual warns that it is pointless to do so if the extra column does not have a unique value per group <http://dev.mysql.com/doc/mysql/en/GROUP-BY-hidden-fields.html>.

Finding the notes which correspond to the maximum val is fundamentally a 2 step process. First you must find the maximum val, then you must find the rows(s) which have that val. This is what Shawn was telling you.

The manual suggests 3 ways to solve this problem <http://dev.mysql.com/doc/mysql/en/example-Maximum-column-group-row.html>. The most efficient solution, and the one that works in all versions of mysql, is to use a temporary table, as Shawn described. As you have mysql 4.1, you could accomplish the same thing with a subquery. In your case, that would be

  SELECT close_date, symbol, name
  FROM TD t1
  WHERE close_date = (SELECT MAX(t2.close_date)
                      FROM TD t2
                      WHERE t1.symbol = t2.symbol)
  AND symbol IN (<list of characters>);

Note that this is still really a 2 step process. The subquery handles the first step, finding the max close_date, while the parent query handles step 2, finding the matching rows.

There is a third way, the MAX-CONCAT trick. It does it in one query without subqueries, and is very inefficient. See the manual for the details.

In other words, this wasn't such a simple query, after all.

Michael





--
Jeff Mathis, Ph.D.                      505-955-1434
Prediction Company                      [EMAIL PROTECTED]
525 Camino de los Marquez, Ste 6        http://www.predict.com
Santa Fe, NM 87505


-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to