Re: simple but frustrating query
i don't think i missed any points raised by anyone in this discussion. email is often a difficult medium for technical issues. for most of our purposes, we run multiple queries in order to make sure we are actually getting the data we want. it makes the code simpler, easier to understand and less error prone. this was one example where conceptually we thought it should be able to be done -- and it can, if you use a nested select. thanks again to all for helped. we're on to bigger and better things... jeff Michael Stassen wrote: Jeff Mathis wrote: well, obviously some, if not all, of what you are saying is true. your I don't believe I said anything untrue. Did you have something in mind? table example below though is not the same as mine. My table stores time series data. for every symbol, there are a series of rows all with different dates. there is a unique constraint on the combination of symbol and close_date. so, for every symbol, there is one and only one maximum date. ... You've missed my point. My example table wasn't meant to be equivalent to yours. Rather, it was a simple example to illustrate the idea that the MAX() function does not look for the row with the largest value. Instead, it tells you what the largest value is with no reference to row. That is, it is an aggregate function for use with GROUP BY. In other words, it is a summary statistic, not a data point. Now, you may know that in your particular situation there will be only one row for each group's max date, but the MAX() function doesn't know that. Even if it did (because your dates are unique), it still wouldn't make sense for MAX() to think in terms of finding a row, because it is perfectly reasonable to ask for the MIN() in the same query. Look at my example again. I asked for the max, the min, and the average. Even if we change that to use your table rather than mine, which row should be pulled? The one with the max date, the one with the min date, or the one with the average date (which probably doesn't even exist)? ... i want that row and the name field it contains. Right, I got that. your example using the subquery works. when we used the subquery approach, we forgot to include the equivalent of t1.symbol = t2.symbol. I'm glad it worked. I was confident that it would. if we use: select max(close_date), symbol, name from TD where symbol in (quoted char string) group by symbol, name order by symbol; we end up getting multiple rows for each symbol if the names change over time. but that's ok for now -- we can parse the query output within our application and get the one row with the most recent date. Right. Adding name to the GROUP BY makes it legitimate to have name in the list of columns to select, but this query doesn't do what you want. It gives the maximum close_date for each symbol-name combination. As you say, you can parse the results to find the max close_date for each symbol subsection, but why would you do that when you already have a query which gives exactly the result you want? what we want to get is conceptually simple, but perhaps not so in terms of SQL. It's easy to say, but describe how to do it: For a given symbol, you have to look at all the close_date values to find the max, then you have to find the row with that value; or sort by close_date, then take the row at the high end of the sorted list; or compare the rows two at a time, storing the rest of the row for the winner of each comparison. They all amount to the same thing: a 2 step process. Those two steps are accomplished by the 2 queries in the temp table method, or by the query + subquery. Doing this in SQL, however, is tricky enough that yours is a frequently asked question. jeff Michael Michael Stassen wrote: No, Shawn's answer is correct. You are starting from a false assumption. You are expecting that MAX(closedate) corresponds to a row. It does not. Consider the table Table=stuff: sym val note --- --- AAA 2 one AAA 2 two AAA 4 three AAA 6 four AAA 12 five AAA 7 six AAA 12 seven BBB 1 eight BBB 2 nine BBB 3 ten Now consider the query SELECT sym, note, MAX(val), MIN(val), AVG(val) FROM stuff GROUP BY sym; Which row should be returned for sym='AAA'? Do you see the problem? It is clear that, for sym=AAA, MIN(val) is 2, MAX(val) is 12, and AVG(val) is about 6.43. Which row is that? The answer is that it's not a row. MAX(), MIN(), and AVG() are aggregate functions. They do not return rows, they return summary stats about a set of rows. Many dbs wouldn't even allow a query like that, because we are asking for a column not included in the GROUP BY. Mysql allows that, but the manual warns that it is pointless to do so if the extra column does not have a unique value per group http://dev.mysql.com/doc/mysql/en/GROUP-BY-hidden-fields.html. Finding the notes which correspond to the maximum val
RE: simple but frustrating query
-Original Message- what we want is the value for the name field corresponding to the row with the most recent close_date. Based on that comment, I'd select name from TD order by close_date DESC limit 1 something like this: select max(close_date), symbol, name from TD where symbol in (list of characters) group by symbol; this returns the max close_date value, but the name that is returned is garbage and seems to point to the earliest row in the table. any suggestions? jeff -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: simple but frustrating query
Ed Lazor wrote: -Original Message- what we want is the value for the name field corresponding to the row with the most recent close_date. Based on that comment, I'd select name from TD order by close_date DESC limit 1 except, we run into problems when there is a list of values for symbol in the query. for example select max(close_date), symbol, name from TD where symbol in ('aa','bb','cc','dd','ee') in fact this is the real problem. for a single value of symbol, we can do this query. but we want to feed in a list of values for symbol something like this: select max(close_date), symbol, name from TD where symbol in (list of characters) group by symbol; this returns the max close_date value, but the name that is returned is garbage and seems to point to the earliest row in the table. any suggestions? jeff -- Jeff Mathis, Ph.D. 505-955-1434 Prediction Company [EMAIL PROTECTED] 525 Camino de los Marquez, Ste 6http://www.predict.com Santa Fe, NM 87505 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: simple but frustrating query
How about select close_date, symbol, name from TD where symbol in (list of characters) limit 1 ORDER BY close_date desc Jeff Mathis [EMAIL PROTECTED] wrote: hello query gurus. we have a table TD with the following columns: close_date symbol name close_date is just a date field there is a unique constraint on the combination of close_date and symbol. what we want is the value for the name field corresponding to the row with the most recent close_date. something like this: select max(close_date), symbol, name from TD where symbol in ( characters) group by symbol; this returns the max close_date value, but the name that is returned is garbage and seems to point to the earliest row in the table. any suggestions? jeff ps we're using mysql 4.1.3 with the innodb engine -- Jeff Mathis, Ph.D. 505-955-1434 Prediction Company [EMAIL PROTECTED] 525 Camino de los Marquez, Ste 6 http://www.predict.com Santa Fe, NM 87505 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] [EMAIL PROTECTED] http://www.joeaudette.com http://www.mojoportal.com
Re: simple but frustrating query
I'll be more explicit: select max(close_date), symbol, name from TD where symbol in ('aa','bb','cc') group by symbol, name order by symbol; returns +-++-+ | max(close_date) | symbol | name| +-++-+ | 2004-10-05 | aa | cmptrhw | | 2004-10-05 | bb | biotech | | 2002-05-03 | bb | drugs | | 2002-02-05 | bb | medprovr| | 2004-10-05 | cc | biotech | | 2002-05-03 | cc | drugs | | 2002-02-05 | cc | infosvcs| +-++-+ now, leaving off name from the group by clause (select max(close_date), symbol, name from TD where symbol in ('aa','bb','cc') group by symbol;) gives +-++-+ | max(close_date) | symbol | name| +-++-+ | 2004-10-05 | aa | cmptrhw | | 2004-10-05 | bb | drugs | | 2004-10-05 | cc | infosvcs| +-++-+ which is wrong. what we want is +-++-+ | max(close_date) | symbol | name| +-++-+ | 2004-10-05 | aa | cmptrhw | | 2004-10-05 | bb | biotech | | 2004-10-05 | cc | biotech | +-++-+ but we can't seem to fomrulate the query. Jeff Mathis wrote: Ed Lazor wrote: -Original Message- what we want is the value for the name field corresponding to the row with the most recent close_date. Based on that comment, I'd select name from TD order by close_date DESC limit 1 except, we run into problems when there is a list of values for symbol in the query. for example select max(close_date), symbol, name from TD where symbol in ('aa','bb','cc','dd','ee') in fact this is the real problem. for a single value of symbol, we can do this query. but we want to feed in a list of values for symbol something like this: select max(close_date), symbol, name from TD where symbol in (list of characters) group by symbol; this returns the max close_date value, but the name that is returned is garbage and seems to point to the earliest row in the table. any suggestions? jeff -- Jeff Mathis, Ph.D. 505-955-1434 Prediction Company [EMAIL PROTECTED] 525 Camino de los Marquez, Ste 6http://www.predict.com Santa Fe, NM 87505 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: simple but frustrating query
It takes two steps: first determine the max(closedate) for each symbol, then use those results to get the name field. You could do this with a subquery (both steps in the one statement) because you are using a version of MySQL 4.0.0 but here is a temp table implementation that will work with just about anyone. CREATE TEMPORARY TABLE tmpSymbols SELECT symbol, max(close_date) as last_date FROM TD WHERE symbol in (list of symbols) GROUP BY symbol; SELECT ts.symbol, ts.last_date, TD.name FROM tmpSymbols ts INNER JOIN TD ON TD.symbol = ts.symbol AND TD.close_date = ts.last_date; Shawn Green Database Administrator Unimin Corporation - Spruce Pine Jeff Mathis [EMAIL PROTECTED] wrote on 10/14/2004 02:22:32 PM: hello query gurus. we have a table TD with the following columns: close_date symbol name close_date is just a date field there is a unique constraint on the combination of close_date and symbol. what we want is the value for the name field corresponding to the row with the most recent close_date. something like this: select max(close_date), symbol, name from TD where symbol in (list of characters) group by symbol; this returns the max close_date value, but the name that is returned is garbage and seems to point to the earliest row in the table. any suggestions? jeff ps we're using mysql 4.1.3 with the innodb engine -- Jeff Mathis, Ph.D. 505-955-1434 Prediction Company [EMAIL PROTECTED] 525 Camino de los Marquez, Ste 6 http://www.predict.com Santa Fe, NM 87505 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: simple but frustrating query
+-++-+ | max(close_date) | symbol | name| +-++-+ | 2004-10-05 | aa | cmptrhw | | 2004-10-05 | bb | biotech | | 2004-10-05 | cc | biotech | +-++-+ . OK I see what you want, this should do it select close_date, symbol, name from TD where symbol in ('aa','bb','cc') AND close_date = (SELECT max(close_date) FROM TD) Jeff Mathis [EMAIL PROTECTED] wrote: I'll be more explicit: select max(close_date), symbol, name from TD where symbol in ('aa','bb','cc') group by symbol, name order by symbol; returns +-++-+ | max(close_date) | symbol | name | +-++-+ | 2004-10-05 | aa | cmptrhw | | 2004-10-05 | bb | biotech | | 2002-05-03 | bb | drugs | | 2002-02-05 | bb | medprovr | | 2004-10-05 | cc | biotech | | 2002-05-03 | cc | drugs | | 2002-02-05 | cc | infosvcs | +-++-+ now, leaving off name from the group by clause (select max(close_date), symbol, name from TD where symbol in ('aa','bb','cc') group by symbol;) gives +-++-+ | max(close_date) | symbol | name | +-++-+ | 2004-10-05 | aa | cmptrhw | | 2004-10-05 | bb | drugs | | 2004-10-05 | cc | infosvcs | +-++-+ which is wrong. what we want is +-++-+ | max(close_date) | symbol | name | +-++-+ | 2004-10-05 | aa | cmptrhw | | 2004-10-05 | bb | biotech | | 2004-10-05 | cc | biotech | +-++-+ but we can't seem to fomrulate the query. Jeff Mathis wrote: Ed Lazor wrote: -Original Message- what we want is the value for the name field corresponding to the row with the most recent close_date. Based on that comment, I'd select name from TD order by close_date DESC limit 1 except, we run into problems when there is a list of values for symbol in the query. for example select max(close_date), symbol, name from TD where symbol in ('aa','bb','cc','dd','ee') in fact this is the real problem. for a single value of symbol, we can do this query. but we want to feed in a list of values for symbol something like this: select max(close_date), symbol, name from TD where symbol in ( of characters) group by symbol; this returns the max close_date value, but the name that is returned is garbage and seems to point to the earliest row in the table. any suggestions? jeff -- Jeff Mathis, Ph.D. 505-955-1434 Prediction Company [EMAIL PROTECTED] 525 Camino de los Marquez, Ste 6 http://www.predict.com Santa Fe, NM 87505 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] [EMAIL PROTECTED] http://www.joeaudette.com http://www.mojoportal.com
Re: simple but frustrating query
we really don't want to issue two queries. this should be able to be done in one, and without using temp tables, but maybe not. thanks for the help though jeff [EMAIL PROTECTED] wrote: It takes two steps: first determine the max(closedate) for each symbol, then use those results to get the name field. You could do this with a subquery (both steps in the one statement) because you are using a version of MySQL 4.0.0 but here is a temp table implementation that will work with just about anyone. CREATE TEMPORARY TABLE tmpSymbols SELECT symbol, max(close_date) as last_date FROM TD X-Mozilla-Status: 8000 X-Mozilla-Status2: WHERE symbol in (list of symbols) GROUP BY symbol; SELECT ts.symbol, ts.last_date, TD.name FROM tmpSymbols ts X-Mozilla-Status: 8000 X-Mozilla-Status2: INNER JOIN TD ON TD.symbol = ts.symbol AND TD.close_date = ts.last_date; Shawn Green Database Administrator Unimin Corporation - Spruce Pine Jeff Mathis [EMAIL PROTECTED] wrote on 10/14/2004 02:22:32 PM: hello query gurus. we have a table TD with the following columns: close_date symbol name close_date is just a date field there is a unique constraint on the combination of close_date and symbol. what we want is the value for the name field corresponding to the row with the most recent close_date. something like this: select max(close_date), symbol, name from TD where symbol in (list of characters) group by symbol; this returns the max close_date value, but the name that is returned is garbage and seems to point to the earliest row in the table. any suggestions? jeff ps we're using mysql 4.1.3 with the innodb engine -- Jeff Mathis, Ph.D. 505-955-1434 Prediction Company [EMAIL PROTECTED] 525 Camino de los Marquez, Ste 6 http://www.predict.com Santa Fe, NM 87505 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- Jeff Mathis, Ph.D. 505-955-1434 Prediction Company [EMAIL PROTECTED] 525 Camino de los Marquez, Ste 6http://www.predict.com Santa Fe, NM 87505 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: simple but frustrating query
Jeff Mathis writes: now, leaving off name from the group by clause (select max(close_date), symbol, name from TD where symbol in ('aa','bb','cc') group by symbol;) gives +-++-+ | max(close_date) | symbol | name| +-++-+ | 2004-10-05 | aa | cmptrhw | | 2004-10-05 | bb | drugs | | 2004-10-05 | cc | infosvcs| +-++-+ Jeff, What version of MySQL are you using? I ran your query on 4.0.18 and got a different answer... mysql select max(close_date), symbol, name from TD - where symbol in ('aa','bb','cc') group by symbol; +-++-+ | max(close_date) | symbol | name| +-++-+ | 2004-10-05 | aa | cmptrhw | | 2004-10-05 | bb | biotech | | 2004-10-05 | cc | biotech | +-++-+ 3 rows in set (0.01 sec) Brad Eacker ([EMAIL PROTECTED]) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: simple but frustrating query
No, Shawn's answer is correct. You are starting from a false assumption. You are expecting that MAX(closedate) corresponds to a row. It does not. Consider the table Table=stuff: sym val note --- --- AAA 2 one AAA 2 two AAA 4 three AAA 6 four AAA 12 five AAA 7 six AAA 12 seven BBB 1 eight BBB 2 nine BBB 3 ten Now consider the query SELECT sym, note, MAX(val), MIN(val), AVG(val) FROM stuff GROUP BY sym; Which row should be returned for sym='AAA'? Do you see the problem? It is clear that, for sym=AAA, MIN(val) is 2, MAX(val) is 12, and AVG(val) is about 6.43. Which row is that? The answer is that it's not a row. MAX(), MIN(), and AVG() are aggregate functions. They do not return rows, they return summary stats about a set of rows. Many dbs wouldn't even allow a query like that, because we are asking for a column not included in the GROUP BY. Mysql allows that, but the manual warns that it is pointless to do so if the extra column does not have a unique value per group http://dev.mysql.com/doc/mysql/en/GROUP-BY-hidden-fields.html. Finding the notes which correspond to the maximum val is fundamentally a 2 step process. First you must find the maximum val, then you must find the rows(s) which have that val. This is what Shawn was telling you. The manual suggests 3 ways to solve this problem http://dev.mysql.com/doc/mysql/en/example-Maximum-column-group-row.html. The most efficient solution, and the one that works in all versions of mysql, is to use a temporary table, as Shawn described. As you have mysql 4.1, you could accomplish the same thing with a subquery. In your case, that would be SELECT close_date, symbol, name FROM TD t1 WHERE close_date = (SELECT MAX(t2.close_date) FROM TD t2 WHERE t1.symbol = t2.symbol) AND symbol IN (list of characters); Note that this is still really a 2 step process. The subquery handles the first step, finding the max close_date, while the parent query handles step 2, finding the matching rows. There is a third way, the MAX-CONCAT trick. It does it in one query without subqueries, and is very inefficient. See the manual for the details. In other words, this wasn't such a simple query, after all. Michael Jeff Mathis wrote: we really don't want to issue two queries. this should be able to be done in one, and without using temp tables, but maybe not. thanks for the help though jeff [EMAIL PROTECTED] wrote: It takes two steps: first determine the max(closedate) for each symbol, then use those results to get the name field. You could do this with a subquery (both steps in the one statement) because you are using a version of MySQL 4.0.0 but here is a temp table implementation that will work with just about anyone. CREATE TEMPORARY TABLE tmpSymbols SELECT symbol, max(close_date) as last_date FROM TD WHERE symbol in (list of symbols) GROUP BY symbol; SELECT ts.symbol, ts.last_date, TD.name FROM tmpSymbols ts INNER JOIN TD ON TD.symbol = ts.symbol AND TD.close_date = ts.last_date; Shawn Green Database Administrator Unimin Corporation - Spruce Pine Jeff Mathis [EMAIL PROTECTED] wrote on 10/14/2004 02:22:32 PM: hello query gurus. we have a table TD with the following columns: close_date symbol name close_date is just a date field there is a unique constraint on the combination of close_date and symbol. what we want is the value for the name field corresponding to the row with the most recent close_date. something like this: select max(close_date), symbol, name from TD where symbol in (list of characters) group by symbol; this returns the max close_date value, but the name that is returned is garbage and seems to point to the earliest row in the table. any suggestions? jeff ps we're using mysql 4.1.3 with the innodb engine -- Jeff Mathis, Ph.D. 505-955-1434 Prediction Company [EMAIL PROTECTED] 525 Camino de los Marquez, Ste 6 http://www.predict.com Santa Fe, NM 87505 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: simple but frustrating query
4.1.3 and the innodb engine on solaris 5.8 Brad Eacker wrote: Jeff Mathis writes: now, leaving off name from the group by clause (select max(close_date), symbol, name from TD where symbol in ('aa','bb','cc') group by symbol;) gives +-++-+ | max(close_date) | symbol | name| +-++-+ | 2004-10-05 | aa | cmptrhw | | 2004-10-05 | bb | drugs | | 2004-10-05 | cc | infosvcs| +-++-+ Jeff, What version of MySQL are you using? I ran your query on 4.0.18 and got a different answer... mysql select max(close_date), symbol, name from TD - where symbol in ('aa','bb','cc') group by symbol; +-++-+ | max(close_date) | symbol | name| +-++-+ | 2004-10-05 | aa | cmptrhw | | 2004-10-05 | bb | biotech | | 2004-10-05 | cc | biotech | +-++-+ 3 rows in set (0.01 sec) Brad Eacker ([EMAIL PROTECTED]) -- Jeff Mathis, Ph.D. 505-955-1434 Prediction Company [EMAIL PROTECTED] 525 Camino de los Marquez, Ste 6http://www.predict.com Santa Fe, NM 87505 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: simple but frustrating query
well, obviously some, if not all, of what you are saying is true. your table example below though is not the same as mine. My table stores time series data. for every symbol, there are a series of rows all with different dates. there is a unique constraint on the combination of symbol and close_date. so, for every symbol, there is one and only one maximum date. i want that row and the name field it contains. your example using the subsquery works. when we used the subquery approach, we forgot to include the equivalent of t1.symbol = t2.symbol. if we use: select max(close_date), symbol, name from TD where symbol in (quoted char string) group by symbol, name order by symbol; we end up getting multiple rows for each symbol if the names change over time. but thats ok for now -- we can parse the query output within our application and get the one row with the most recent date. what we want to get is conceptually simple, but perhaps not so in terms of SQL. jeff Michael Stassen wrote: No, Shawn's answer is correct. You are starting from a false assumption. You are expecting that MAX(closedate) corresponds to a row. It does not. Consider the table Table=stuff: sym val note --- --- AAA 2 one AAA 2 two AAA 4 three AAA 6 four AAA 12 five AAA 7 six AAA 12 seven BBB 1 eight BBB 2 nine BBB 3 ten Now consider the query SELECT sym, note, MAX(val), MIN(val), AVG(val) FROM stuff GROUP BY sym; Which row should be returned for sym='AAA'? Do you see the problem? It is clear that, for sym=AAA, MIN(val) is 2, MAX(val) is 12, and AVG(val) is about 6.43. Which row is that? The answer is that it's not a row. MAX(), MIN(), and AVG() are aggregate functions. They do not return rows, they return summary stats about a set of rows. Many dbs wouldn't even allow a query like that, because we are asking for a column not included in the GROUP BY. Mysql allows that, but the manual warns that it is pointless to do so if the extra column does not have a unique value per group http://dev.mysql.com/doc/mysql/en/GROUP-BY-hidden-fields.html. Finding the notes which correspond to the maximum val is fundamentally a 2 step process. First you must find the maximum val, then you must find the rows(s) which have that val. This is what Shawn was telling you. The manual suggests 3 ways to solve this problem http://dev.mysql.com/doc/mysql/en/example-Maximum-column-group-row.html. The most efficient solution, and the one that works in all versions of mysql, is to use a temporary table, as Shawn described. As you have mysql 4.1, you could accomplish the same thing with a subquery. In your case, that would be SELECT close_date, symbol, name FROM TD t1 WHERE close_date = (SELECT MAX(t2.close_date) FROM TD t2 WHERE t1.symbol = t2.symbol) AND symbol IN (list of characters); Note that this is still really a 2 step process. The subquery handles the first step, finding the max close_date, while the parent query handles step 2, finding the matching rows. There is a third way, the MAX-CONCAT trick. It does it in one query without subqueries, and is very inefficient. See the manual for the details. In other words, this wasn't such a simple query, after all. Michael Jeff Mathis wrote: we really don't want to issue two queries. this should be able to be done in one, and without using temp tables, but maybe not. thanks for the help though jeff [EMAIL PROTECTED] wrote: It takes two steps: first determine the max(closedate) for each symbol, then use those results to get the name field. You could do this with a subquery (both steps in the one statement) because you are using a version of MySQL 4.0.0 but here is a temp table implementation that will work with just about anyone. CREATE TEMPORARY TABLE tmpSymbols SELECT symbol, max(close_date) as last_date FROM TD WHERE symbol in (list of symbols) GROUP BY symbol; SELECT ts.symbol, ts.last_date, TD.name FROM tmpSymbols ts INNER JOIN TD ON TD.symbol = ts.symbol AND TD.close_date = ts.last_date; Shawn Green Database Administrator Unimin Corporation - Spruce Pine Jeff Mathis [EMAIL PROTECTED] wrote on 10/14/2004 02:22:32 PM: hello query gurus. we have a table TD with the following columns: close_date symbol name close_date is just a date field there is a unique constraint on the combination of close_date and symbol. what we want is the value for the name field corresponding to the row with the most recent close_date. something like this: select max(close_date), symbol, name from TD where symbol in (list of characters) group by symbol; this returns the max close_date value, but the name that is returned is garbage and seems to point to the earliest row in the table. any suggestions? jeff ps we're using mysql 4.1.3 with the innodb engine -- Jeff Mathis, Ph.D. 505-955-1434 Prediction Company [EMAIL
Re: simple but frustrating query
Jeff Mathis wrote: well, obviously some, if not all, of what you are saying is true. your I don't believe I said anything untrue. Did you have something in mind? table example below though is not the same as mine. My table stores time series data. for every symbol, there are a series of rows all with different dates. there is a unique constraint on the combination of symbol and close_date. so, for every symbol, there is one and only one maximum date. ... You've missed my point. My example table wasn't meant to be equivalent to yours. Rather, it was a simple example to illustrate the idea that the MAX() function does not look for the row with the largest value. Instead, it tells you what the largest value is with no reference to row. That is, it is an aggregate function for use with GROUP BY. In other words, it is a summary statistic, not a data point. Now, you may know that in your particular situation there will be only one row for each group's max date, but the MAX() function doesn't know that. Even if it did (because your dates are unique), it still wouldn't make sense for MAX() to think in terms of finding a row, because it is perfectly reasonable to ask for the MIN() in the same query. Look at my example again. I asked for the max, the min, and the average. Even if we change that to use your table rather than mine, which row should be pulled? The one with the max date, the one with the min date, or the one with the average date (which probably doesn't even exist)? ... i want that row and the name field it contains. Right, I got that. your example using the subquery works. when we used the subquery approach, we forgot to include the equivalent of t1.symbol = t2.symbol. I'm glad it worked. I was confident that it would. if we use: select max(close_date), symbol, name from TD where symbol in (quoted char string) group by symbol, name order by symbol; we end up getting multiple rows for each symbol if the names change over time. but that's ok for now -- we can parse the query output within our application and get the one row with the most recent date. Right. Adding name to the GROUP BY makes it legitimate to have name in the list of columns to select, but this query doesn't do what you want. It gives the maximum close_date for each symbol-name combination. As you say, you can parse the results to find the max close_date for each symbol subsection, but why would you do that when you already have a query which gives exactly the result you want? what we want to get is conceptually simple, but perhaps not so in terms of SQL. It's easy to say, but describe how to do it: For a given symbol, you have to look at all the close_date values to find the max, then you have to find the row with that value; or sort by close_date, then take the row at the high end of the sorted list; or compare the rows two at a time, storing the rest of the row for the winner of each comparison. They all amount to the same thing: a 2 step process. Those two steps are accomplished by the 2 queries in the temp table method, or by the query + subquery. Doing this in SQL, however, is tricky enough that yours is a frequently asked question. jeff Michael Michael Stassen wrote: No, Shawn's answer is correct. You are starting from a false assumption. You are expecting that MAX(closedate) corresponds to a row. It does not. Consider the table Table=stuff: sym val note --- --- AAA 2 one AAA 2 two AAA 4 three AAA 6 four AAA 12 five AAA 7 six AAA 12 seven BBB 1 eight BBB 2 nine BBB 3 ten Now consider the query SELECT sym, note, MAX(val), MIN(val), AVG(val) FROM stuff GROUP BY sym; Which row should be returned for sym='AAA'? Do you see the problem? It is clear that, for sym=AAA, MIN(val) is 2, MAX(val) is 12, and AVG(val) is about 6.43. Which row is that? The answer is that it's not a row. MAX(), MIN(), and AVG() are aggregate functions. They do not return rows, they return summary stats about a set of rows. Many dbs wouldn't even allow a query like that, because we are asking for a column not included in the GROUP BY. Mysql allows that, but the manual warns that it is pointless to do so if the extra column does not have a unique value per group http://dev.mysql.com/doc/mysql/en/GROUP-BY-hidden-fields.html. Finding the notes which correspond to the maximum val is fundamentally a 2 step process. First you must find the maximum val, then you must find the rows(s) which have that val. This is what Shawn was telling you. The manual suggests 3 ways to solve this problem http://dev.mysql.com/doc/mysql/en/example-Maximum-column-group-row.html. The most efficient solution, and the one that works in all versions of mysql, is to use a temporary table, as Shawn described. As you have mysql 4.1, you could accomplish the same thing with a subquery. In your case, that would be SELECT close_date,