RE: query results group/summed by interval
Hi all, Aveeks solution should work if you have at least one call for each intervall. It's the classical GROUP BY solution that only works on the available dataset. Although it should work pretty well in the cited scenario, you will miss intervals (from a all intervals report point of view) if indeed there are intervals (of more than 5 minutes, in this example) when there were no calls at all. I had a somewhat similar problem (running the second scenario, though) and this is the solution I setup (this was a Data Warehouse and that's why you'll read about partition pruning, dataset was dozens of Gigs): http://gpshumano.blogs.dri.pt/2009/09/28/finding-for-each-time-interval-how-many-records-are-ocurring-during-that-interval/ This might become handy if Ghulam understands the differences between my scenario and his. Hope that helps, -NT Quoting Martin Gainty mgai...@hotmail.com: no that would give you the count for each second interval instead of using the interval variable 5 Aveeks floor: FLOOR(X) Returns the largest integer value not greater than X. 1st (seconds/5) interval example 5/5=1 floor(5/5) = 1 supplied value would truncate and give you the int not greater than X then multiply by 5 1*5=5 is correct Aveeks sum function: SUM([DISTINCT] expr) Returns the sum of expr. If the return set has no rows, SUM() returns NULL. The DISTINCT keyword can be used in MySQL 5.0 to sum only the distinct values of expr. SUM() returns NULL if there were no matching rows. sum(calls) from calls group by 5 * floor(seconds/5) sum(calls) from calls group by 5 * floor(5/5) sum(calls) from class group by 5 * 1 sum(calls) from class group by 5 is correct 2nd(seconds/5) interval example 10/5=2 floor(10/5)=2 supplied value would truncate and give you the int not greater than X then multiply by 5 2*5=10 is correct Aveeks sum function sum(calls) from calls group by 5 * floor(seconds/5) sum(calls) from calls group by 5 * floor(10/5) sum(calls) from class group by 5 * 2 sum(calls) from class group by 10 would be applicable only if the interval was 10 Aveek if your interval is 5 change: sum(calls) from calls group by 5 * floor(seconds/5) to sum(calls) from calls group by floor(seconds/5) Martin Gainty __ Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen. Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni. Date: Sat, 31 Jul 2010 10:31:43 +0700 From: cuong.m...@vienthongso.com To: ave...@yahoo-inc.com CC: mustafa...@gmail.com; mysql@lists.mysql.com Subject: Re: query results group/summed by interval Hi Aveek, I think Ghulam just want to count calls for each intervals so the query should looks like this: select count(*) as total_calls, queue_seconds from calls group by queue_seconds order by total_calls; - Original Message - From: Aveek Misra ave...@yahoo-inc.com To: Ghulam Mustafa mustafa...@gmail.com, mysql@lists.mysql.com Sent: Tuesday, July 27, 2010 5:54:13 PM Subject: RE: query results group/summed by interval try this ... select 5 * floor(seconds/5) as start, 5 * floor(seconds/5) + 5 as end, sum(calls) from calls group by 5 * floor(seconds/5); This should give you an output of the type +---+--++ | start | end | sum(calls) | +---+--++ | 0 | 5 | 387 | | 5 | 10 | 225 | | 10 | 15 | 74 | +---+--++ Thanks Aveek From: Ghulam Mustafa [mustafa...@gmail.com] Sent: Tuesday, July 27, 2010 3:53 PM To: mysql@lists.mysql.com Subject: query results group/summed by interval Hi everyone, i have two columns (seconds, number of calls), i need to produce a report which will show total number of calls in intervals (let'say 10 seconds interval), i know i can do this programmability in my script but i was wondering if it's possible to accomplish this behavior within mysql. for example i have following data
RE: query results group/summed by interval
You could also pre-define your intervals in a subquery using UNION and join that to your original table like so: select ifnull(sum(calls), 0) as calls, n as queue_seconds from (select 0 as n union select 5 union select 10 union select 15) as step left join calls on calls.queue_seconds (step.n - 5) and calls.queue_seconds = step.n group by n; +---+---+ | calls | queue_seconds | +---+---+ | 250 | 0 | | 168 | 5 | | 268 | 10 | | 0 | 15 | +---+---+ 4 rows in set (0.00 sec) -Travis Date: Sun, 1 Aug 2010 13:16:36 +0100 From: nuno.tava...@dri.pt To: mgai...@hotmail.com CC: cuong.m...@vienthongso.com; ave...@yahoo-inc.com; mustafa...@gmail.com; mysql@lists.mysql.com Subject: RE: query results group/summed by interval Hi all, Aveeks solution should work if you have at least one call for each intervall. It's the classical GROUP BY solution that only works on the available dataset. Although it should work pretty well in the cited scenario, you will miss intervals (from a all intervals report point of view) if indeed there are intervals (of more than 5 minutes, in this example) when there were no calls at all. I had a somewhat similar problem (running the second scenario, though) and this is the solution I setup (this was a Data Warehouse and that's why you'll read about partition pruning, dataset was dozens of Gigs): http://gpshumano.blogs.dri.pt/2009/09/28/finding-for-each-time-interval-how-many-records-are-ocurring-during-that-interval/ This might become handy if Ghulam understands the differences between my scenario and his. Hope that helps, -NT Quoting Martin Gainty : no that would give you the count for each second interval instead of using the interval variable 5 Aveeks floor: FLOOR(X) Returns the largest integer value not greater than X. 1st (seconds/5) interval example 5/5=1 floor(5/5) = 1 supplied value would truncate and give you the int not greater than X then multiply by 5 1*5=5 is correct Aveeks sum function: SUM([DISTINCT] expr) Returns the sum of expr. If the return set has no rows, SUM() returns NULL. The DISTINCT keyword can be used in MySQL 5.0 to sum only the distinct values of expr. SUM() returns NULL if there were no matching rows. sum(calls) from calls group by 5 * floor(seconds/5) sum(calls) from calls group by 5 * floor(5/5) sum(calls) from class group by 5 * 1 sum(calls) from class group by 5 is correct 2nd(seconds/5) interval example 10/5=2 floor(10/5)=2 supplied value would truncate and give you the int not greater than X then multiply by 5 2*5=10 is correct Aveeks sum function sum(calls) from calls group by 5 * floor(seconds/5) sum(calls) from calls group by 5 * floor(10/5) sum(calls) from class group by 5 * 2 sum(calls) from class group by 10 would be applicable only if the interval was 10 Aveek if your interval is 5 change: sum(calls) from calls group by 5 * floor(seconds/5) to sum(calls) from calls group by floor(seconds/5) Martin Gainty __ Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen. Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni. Date: Sat, 31 Jul 2010 10:31:43 +0700 From: cuong.m...@vienthongso.com To: ave...@yahoo-inc.com CC: mustafa...@gmail.com; mysql@lists.mysql.com Subject: Re: query results group/summed by interval Hi Aveek, I think Ghulam just want to count calls for each intervals so the query should looks like this: select count(*) as total_calls, queue_seconds from calls group by queue_seconds order by total_calls; - Original Message - From: Aveek Misra To: Ghulam Mustafa , mysql@lists.mysql.com Sent: Tuesday, July 27, 2010 5:54:13 PM Subject: RE: query results group/summed by interval try this ... select 5 * floor(seconds/5) as start, 5
RE: query results group/summed by interval
no that would give you the count for each second interval instead of using the interval variable 5 Aveeks floor: FLOOR(X) Returns the largest integer value not greater than X. 1st (seconds/5) interval example 5/5=1 floor(5/5) = 1 supplied value would truncate and give you the int not greater than X then multiply by 5 1*5=5 is correct Aveeks sum function: SUM([DISTINCT] expr) Returns the sum of expr. If the return set has no rows, SUM() returns NULL. The DISTINCT keyword can be used in MySQL 5.0 to sum only the distinct values of expr. SUM() returns NULL if there were no matching rows. sum(calls) from calls group by 5 * floor(seconds/5) sum(calls) from calls group by 5 * floor(5/5) sum(calls) from class group by 5 * 1 sum(calls) from class group by 5 is correct 2nd(seconds/5) interval example 10/5=2 floor(10/5)=2 supplied value would truncate and give you the int not greater than X then multiply by 5 2*5=10 is correct Aveeks sum function sum(calls) from calls group by 5 * floor(seconds/5) sum(calls) from calls group by 5 * floor(10/5) sum(calls) from class group by 5 * 2 sum(calls) from class group by 10 would be applicable only if the interval was 10 Aveek if your interval is 5 change: sum(calls) from calls group by 5 * floor(seconds/5) to sum(calls) from calls group by floor(seconds/5) Martin Gainty __ Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen. Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni. Date: Sat, 31 Jul 2010 10:31:43 +0700 From: cuong.m...@vienthongso.com To: ave...@yahoo-inc.com CC: mustafa...@gmail.com; mysql@lists.mysql.com Subject: Re: query results group/summed by interval Hi Aveek, I think Ghulam just want to count calls for each intervals so the query should looks like this: select count(*) as total_calls, queue_seconds from calls group by queue_seconds order by total_calls; - Original Message - From: Aveek Misra ave...@yahoo-inc.com To: Ghulam Mustafa mustafa...@gmail.com, mysql@lists.mysql.com Sent: Tuesday, July 27, 2010 5:54:13 PM Subject: RE: query results group/summed by interval try this ... select 5 * floor(seconds/5) as start, 5 * floor(seconds/5) + 5 as end, sum(calls) from calls group by 5 * floor(seconds/5); This should give you an output of the type +---+--++ | start | end | sum(calls) | +---+--++ | 0 | 5 | 387 | | 5 | 10 | 225 | | 10 | 15 | 74 | +---+--++ Thanks Aveek From: Ghulam Mustafa [mustafa...@gmail.com] Sent: Tuesday, July 27, 2010 3:53 PM To: mysql@lists.mysql.com Subject: query results group/summed by interval Hi everyone, i have two columns (seconds, number of calls), i need to produce a report which will show total number of calls in intervals (let'say 10 seconds interval), i know i can do this programmability in my script but i was wondering if it's possible to accomplish this behavior within mysql. for example i have following data. +--+---+ | calls | queue_seconds | +--+---+ | 250 | 0.00 | | 28 | 1.00 | | 30 | 2.00 | | 56 | 3.00 | | 23 | 4.00 | | 31 | 5.00 | | 33 | 6.00 | | 50 | 7.00 | | 49 | 8.00 | | 62 | 9.00 | | 74 | 10.00 | ... ... and so on... ... +--+---+ now result should look like this with a 5 seconds interval. +--+---+ | count(*) | queue_seconds | +--+---+ | 250 | 0.00 | | 168 | 5.00 | | 268 | 10.00 | ... ... and so on... ... +--+---+ i would really appreciate your help. Best Regards. -- Ghulam Mustafa -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=ave...@yahoo-inc.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=cuong.m...@vienthongso.com -- Best Regards, Cuongmc
RE: query results group/summed by interval
no that would give you the count for each second interval instead of using the interval variable 5 Aveeks floor: FLOOR(X) Returns the largest integer value not greater than X. 1st (seconds/5) interval example 5/5=1 floor(5/5) = 1 supplied value would truncate and give you the int not greater than X then multiply by 5 1*5=5 is correct Aveeks sum function: SUM([DISTINCT] expr) Returns the sum of expr. If the return set has no rows, SUM() returns NULL. The DISTINCT keyword can be used in MySQL 5.0 to sum only the distinct values of expr. SUM() returns NULL if there were no matching rows. sum(calls) from calls group by 5 * floor(seconds/5) sum(calls) from calls group by 5 * floor(5/5) sum(calls) from class group by 5 * 1 sum(calls) from class group by 5 is correct 2nd(seconds/5) interval example 10/5=2 floor(10/5)=2 supplied value would truncate and give you the int not greater than X then multiply by 5 2*5=10 is correct Aveeks sum function sum(calls) from calls group by 5 * floor(seconds/5) sum(calls) from calls group by 5 * floor(10/5) sum(calls) from class group by 5 * 2 sum(calls) from class group by 10 would be applicable only if the interval was 10 Aveek if your interval is 5 change: sum(calls) from calls group by 5 * floor(seconds/5) to sum(calls) from calls group by floor(seconds/5) Martin Gainty __ Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen. Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni. Date: Sat, 31 Jul 2010 10:31:43 +0700 From: cuong.m...@vienthongso.com To: ave...@yahoo-inc.com CC: mustafa...@gmail.com; mysql@lists.mysql.com Subject: Re: query results group/summed by interval Hi Aveek, I think Ghulam just want to count calls for each intervals so the query should looks like this: select count(*) as total_calls, queue_seconds from calls group by queue_seconds order by total_calls; - Original Message - From: Aveek Misra ave...@yahoo-inc.com To: Ghulam Mustafa mustafa...@gmail.com, mysql@lists.mysql.com Sent: Tuesday, July 27, 2010 5:54:13 PM Subject: RE: query results group/summed by interval try this ... select 5 * floor(seconds/5) as start, 5 * floor(seconds/5) + 5 as end, sum(calls) from calls group by 5 * floor(seconds/5); This should give you an output of the type +---+--++ | start | end | sum(calls) | +---+--++ | 0 | 5 | 387 | | 5 | 10 | 225 | | 10 | 15 | 74 | +---+--++ Thanks Aveek From: Ghulam Mustafa [mustafa...@gmail.com] Sent: Tuesday, July 27, 2010 3:53 PM To: mysql@lists.mysql.com Subject: query results group/summed by interval Hi everyone, i have two columns (seconds, number of calls), i need to produce a report which will show total number of calls in intervals (let'say 10 seconds interval), i know i can do this programmability in my script but i was wondering if it's possible to accomplish this behavior within mysql. for example i have following data. +--+---+ | calls | queue_seconds | +--+---+ | 250 | 0.00 | | 28 | 1.00 | | 30 | 2.00 | | 56 | 3.00 | | 23 | 4.00 | | 31 | 5.00 | | 33 | 6.00 | | 50 | 7.00 | | 49 | 8.00 | | 62 | 9.00 | | 74 | 10.00 | ... ... and so on... ... +--+---+ now result should look like this with a 5 seconds interval. +--+---+ | count(*) | queue_seconds | +--+---+ | 250 | 0.00 | | 168 | 5.00 | | 268 | 10.00 | ... ... and so on... ... +--+---+ i would really appreciate your help. Best Regards. -- Ghulam Mustafa -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=ave...@yahoo-inc.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=cuong.m...@vienthongso.com -- Best Regards, Cuongmc
Re: query results group/summed by interval
Hi Aveek, I think Ghulam just want to count calls for each intervals so the query should looks like this: select count(*) as total_calls, queue_seconds from calls group by queue_seconds order by total_calls; - Original Message - From: Aveek Misra ave...@yahoo-inc.com To: Ghulam Mustafa mustafa...@gmail.com, mysql@lists.mysql.com Sent: Tuesday, July 27, 2010 5:54:13 PM Subject: RE: query results group/summed by interval try this ... select 5 * floor(seconds/5) as start, 5 * floor(seconds/5) + 5 as end, sum(calls) from calls group by 5 * floor(seconds/5); This should give you an output of the type +---+--++ | start | end | sum(calls) | +---+--++ | 0 |5 |387 | | 5 | 10 |225 | |10 | 15 | 74 | +---+--++ Thanks Aveek From: Ghulam Mustafa [mustafa...@gmail.com] Sent: Tuesday, July 27, 2010 3:53 PM To: mysql@lists.mysql.com Subject: query results group/summed by interval Hi everyone, i have two columns (seconds, number of calls), i need to produce a report which will show total number of calls in intervals (let'say 10 seconds interval), i know i can do this programmability in my script but i was wondering if it's possible to accomplish this behavior within mysql. for example i have following data. +--+---+ | calls | queue_seconds | +--+---+ | 250 | 0.00 | | 28 | 1.00 | | 30 | 2.00 | | 56 | 3.00 | | 23 | 4.00 | | 31 | 5.00 | | 33 | 6.00 | | 50 | 7.00 | | 49 | 8.00 | | 62 | 9.00 | | 74 | 10.00 | ... ... and so on... ... +--+---+ now result should look like this with a 5 seconds interval. +--+---+ | count(*) | queue_seconds | +--+---+ | 250 | 0.00 | | 168 | 5.00 | | 268 | 10.00 | ... ... and so on... ... +--+---+ i would really appreciate your help. Best Regards. -- Ghulam Mustafa -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=ave...@yahoo-inc.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=cuong.m...@vienthongso.com -- Best Regards, Cuongmc. -- Nguyen Manh Cuong Phong Ky Thuat - Cong ty Vien Thong So - VTC Dien thoai: 0912051542 Gmail : philipscu...@gmail.com YahooMail : philipscu...@yahoo.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: query results group/summed by interval
try this ... select 5 * floor(seconds/5) as start, 5 * floor(seconds/5) + 5 as end, sum(calls) from calls group by 5 * floor(seconds/5); This should give you an output of the type +---+--++ | start | end | sum(calls) | +---+--++ | 0 |5 |387 | | 5 | 10 |225 | |10 | 15 | 74 | +---+--++ Thanks Aveek From: Ghulam Mustafa [mustafa...@gmail.com] Sent: Tuesday, July 27, 2010 3:53 PM To: mysql@lists.mysql.com Subject: query results group/summed by interval Hi everyone, i have two columns (seconds, number of calls), i need to produce a report which will show total number of calls in intervals (let'say 10 seconds interval), i know i can do this programmability in my script but i was wondering if it's possible to accomplish this behavior within mysql. for example i have following data. +--+---+ | calls | queue_seconds | +--+---+ | 250 | 0.00 | | 28 | 1.00 | | 30 | 2.00 | | 56 | 3.00 | | 23 | 4.00 | | 31 | 5.00 | | 33 | 6.00 | | 50 | 7.00 | | 49 | 8.00 | | 62 | 9.00 | | 74 | 10.00 | ... ... and so on... ... +--+---+ now result should look like this with a 5 seconds interval. +--+---+ | count(*) | queue_seconds | +--+---+ | 250 | 0.00 | | 168 | 5.00 | | 268 | 10.00 | ... ... and so on... ... +--+---+ i would really appreciate your help. Best Regards. -- Ghulam Mustafa -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=ave...@yahoo-inc.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: Query with group by
Hi Michael! Talking about the query with group by issue... I'll explain my reasoning below. [...] From the manual, section 7.2.8 How MySQL Optimizes LEFT JOIN and RIGHT JOIN http://dev.mysql.com/doc/mysql/en/LEFT_JOIN_optimization.html: A LEFT JOIN B join_condition is implemented in MySQL as follows: ... * The LEFT JOIN condition is used to decide how to retrieve rows from table B. (In other words, any condition in the WHERE clause is not used.) ... So, the WHERE c2.id IS NULL cannot be applied until after the rows which match the ON clause (and the NULL rows) have been fetched. Thanks for your explanations here, it's clear now I was confused. I missed this LEFT JOIN behaviour. I was in the idea the WHERE xx IS NULL was taken into account. [...] You are certainly right that temporary and filesort are to be avoided. And they will be, if the table is properly indexed. Single column indexing won't help much here, because the WHERE condition, the GROUP BY column, and the MAX column are all different. A multi-column index on (content, location, date), however, will allow mysql to use the index to find the matching rows, find the groups, and calculate the MAX date. I still refuse to use the temporary table solution, call me fussy here. Indeed, I think giving more indexes than necessary is a bit redundant, unless completely necessary. (You will end up having more space on indexes than data itself). It's a handle with care issue for me. Anyway, I don't know if one can program an agregate UDF called something like EXTERNAL_MAX(...) or something, so that we could do like: SELECT EXTERNAL_MAX(date, version) --- i.e: Returns the version value for the row with MAX(date). This, for sure, will be the best solution. ;-) That would have to do the same thing behind the scenes. I have to beg you pardon here. ;-) Think again this solution doesn't require a JOIN, nor a temporary table. I think the UDF solution doesn't have to be less efficient than a MAX or AVG aggregate function by itself. I'm in the process of creating such a monster :-) Let me know if you are curious about and have time to test it, I will test MAX() and EXTERNAL_MAX() against a very large table. Cheers, Jose Miguel. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query with group by
Here is the output I got when I ran my query - and yours, Michael - against DB2 V7.2. Please note that I replaced the 'temp' table in Michael's query with the real table in both the outer query and the subquery; no other changes were made. -- create table versions (id smallint not null, date date not null, content char(5) not null, location char(10) not null, version smallint not null, primary key(id)) DB2I The SQL command completed successfully. insert into versions values (1, '2004-09-14', 'ALPHA', 'PARIS', 10), (2, '2004-09-15', 'ALPHA', 'PARIS', 11), (3, '2004-09-16', 'ALPHA', 'PARIS', 10), (4, '2004-09-14', 'ALPHA', 'NEW-YORK', 11), (5, '2004-09-15', 'ALPHA', 'NEW-YORK', 11), (6, '2004-09-16', 'ALPHA', 'NEW-YORK', 10), (7, '2004-09-14', 'ALPHA', 'TOKYO', 10), (8, '2004-09-15', 'ALPHA', 'TOKYO', 11), (9, '2004-09-16', 'BETA', 'TOKYO', 10) DB2I The SQL command completed successfully. select content, location, version, date from versions where concat(location, char(date)) in (select concat(location, char(max(date))) from versions where content = 'ALPHA' group by location) CONTENT LOCATION VERSION DATE --- -- --- -- ALPHA NEW-YORK10 09/16/2004 ALPHA PARIS 10 09/16/2004 ALPHA TOKYO 11 09/15/2004 3 record(s) selected. SELECT content, location, version, date FROM versions t1 WHERE date=(SELECT MAX(t2.date) FROM versions t2 WHERE t1.location = t2.location AND t1.content = t2.content) AND content = 'ALPHA' CONTENT LOCATION VERSION DATE --- -- --- -- ALPHA PARIS 10 09/16/2004 ALPHA NEW-YORK10 09/16/2004 ALPHA TOKYO 11 09/15/2004 3 record(s) selected. -- As you can see, both queries worked and produced the same result in DB2, aside from the row sequence, which is easily fixable via an Order By. I'm at a loss to explain why my query didn't work in MySQL V4.1.4. Rhino - Original Message - From: Michael Stassen [EMAIL PROTECTED] To: Rhino [EMAIL PROTECTED] Cc: Jose Miguel Pérez [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Friday, September 17, 2004 12:42 AM Subject: Re: Query with group by Rhino wrote: I agree that Michael's solution with the temporary tables is the best I have seen so far. I can't take much credit. It's just an adaptation of the solution in the manual. I am running MySQL 4.0.15 so I don't have any subquery capability. As a result, I went to DB2 to come up with one solution that does give the right answer via a subquery: select content, location, version, date from versions where concat(location, char(date)) in (select concat(location, char(max(date))) from versions where content = 'ALPHA' group by location); Really? That didn't work for me in mysql 4.1.4a-gamma. I got +-+--+-++ | content | location | version | date | +-+--+-++ | ALPHA | PARIS| 10 | 2004-09-14 | | ALPHA | PARIS| 11 | 2004-09-15 | | ALPHA | PARIS| 10 | 2004-09-16 | | ALPHA | NEW-YORK | 11 | 2004-09-14 | | ALPHA | NEW-YORK | 11 | 2004-09-15 | | ALPHA | NEW-YORK | 10 | 2004-09-16 | | ALPHA | TOKYO| 10 | 2004-09-14 | | ALPHA | TOKYO| 11 | 2004-09-15 | | BETA| TOKYO| 10 | 2004-09-16 | +-+--+-++ 9 rows in set (0.31 sec) I don't know if this will work in MySQL 4.1.x though. You may want to give it a try if you have 4.1.x. By the way, I'm not convinced that this is the *best* solution using a subquery; it's just the first one I could think of. I don't have all day to spend on this ;-) The following query works for me. SELECT content, location, version, date FROM temp t1 WHERE date=(SELECT MAX(t2.date) FROM temp t2 WHERE t1.location = t2.location AND t1.content = t2.content) AND content = 'ALPHA'; +-+--+-++ | content | location | version | date | +-+--+-++ | ALPHA | PARIS| 10 | 2004-09-16 | | ALPHA | NEW-YORK | 10 | 2004-09-16 | | ALPHA | TOKYO| 11 | 2004-09-15 | +-+--+-++ 3 rows in set (0.01 sec) Again, that's an adaptation of the subquery version of the solution in the manual. Michael -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query with group by
I took a closer look and found the problem -- char(date). Is char() necessary in DB2 to cast the date as a string? In MySQL, char() expects a list of integers to interpret as a list of character codes http://dev.mysql.com/doc/mysql/en/String_functions.html. Removing char() makes the query appear to work. I say appear because you forgot the same thing I did. We need to guard against the possibility that a row with different content will match the location and date. So, your query becomes SELECT content, location, version, date FROM temp WHERE CONCAT(location, date) IN (SELECT CONCAT(location, MAX(date)) FROM temp WHERE content = 'ALPHA' GROUP BY location) AND content = 'ALPHA'; which works for me in 4.1.4a. That said, it seems to me that this is a version of the MAX-CONCAT trick written as a subquery. With the columns to be compared inside CONCAT() functions, there is no way to use an index to match up the rows. My expectation is that this will be relatively inefficient compared to the other subquery solution. Michael Rhino wrote: Here is the output I got when I ran my query - and yours, Michael - against DB2 V7.2. Please note that I replaced the 'temp' table in Michael's query with the real table in both the outer query and the subquery; no other changes were made. -- create table versions (id smallint not null, date date not null, content char(5) not null, location char(10) not null, version smallint not null, primary key(id)) DB2I The SQL command completed successfully. insert into versions values (1, '2004-09-14', 'ALPHA', 'PARIS', 10), (2, '2004-09-15', 'ALPHA', 'PARIS', 11), (3, '2004-09-16', 'ALPHA', 'PARIS', 10), (4, '2004-09-14', 'ALPHA', 'NEW-YORK', 11), (5, '2004-09-15', 'ALPHA', 'NEW-YORK', 11), (6, '2004-09-16', 'ALPHA', 'NEW-YORK', 10), (7, '2004-09-14', 'ALPHA', 'TOKYO', 10), (8, '2004-09-15', 'ALPHA', 'TOKYO', 11), (9, '2004-09-16', 'BETA', 'TOKYO', 10) DB2I The SQL command completed successfully. select content, location, version, date from versions where concat(location, char(date)) in (select concat(location, char(max(date))) from versions where content = 'ALPHA' group by location) CONTENT LOCATION VERSION DATE --- -- --- -- ALPHA NEW-YORK10 09/16/2004 ALPHA PARIS 10 09/16/2004 ALPHA TOKYO 11 09/15/2004 3 record(s) selected. SELECT content, location, version, date FROM versions t1 WHERE date=(SELECT MAX(t2.date) FROM versions t2 WHERE t1.location = t2.location AND t1.content = t2.content) AND content = 'ALPHA' CONTENT LOCATION VERSION DATE --- -- --- -- ALPHA PARIS 10 09/16/2004 ALPHA NEW-YORK10 09/16/2004 ALPHA TOKYO 11 09/15/2004 3 record(s) selected. -- As you can see, both queries worked and produced the same result in DB2, aside from the row sequence, which is easily fixable via an Order By. I'm at a loss to explain why my query didn't work in MySQL V4.1.4. Rhino -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query with group by
From the SQL Reference for DB2: --- -CHAR--(--datetime-expression+---+--) '-,--+-ISO---+--' +-USA---+ +-EUR---+ +-JIS---+ '-LOCAL-' Datetime to Character datetime-expression An expression that is one of the following three data types date The result is the character string representation of the date in the format specified by the second argument. The length of the result is 10. An error occurs if the second argument is specified and is not a valid value (SQLSTATE 42703). time The result is the character string representation of the time in the format specified by the second argument. The length of the result is 8. An error occurs if the second argument is specified and is not a valid value (SQLSTATE 42703). timestamp The second argument is not applicable and must not be specified. SQLSTATE 42815 The result is the character string representation of the timestamp. The length of the result is 26. The code page of the string is the code page of the database at the application server. --- So, yes, char() is converting the date to a string as you suspected. I'm not suprised that the query wouldn't perform terribly well; that's what I would have expected given the construction of the query. Like I said, it was just the first solution I came up with. I would likely have come up with something more like your solution if I had had more time to try other solutions :-) In any case, we have now thought of several solutions to the problem. I hope Vincent can use at least one of them ;-) Rhino - Original Message - From: Michael Stassen [EMAIL PROTECTED] To: Rhino [EMAIL PROTECTED] Cc: Jose Miguel Pérez [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Friday, September 17, 2004 8:13 PM Subject: Re: Query with group by I took a closer look and found the problem -- char(date). Is char() necessary in DB2 to cast the date as a string? In MySQL, char() expects a list of integers to interpret as a list of character codes http://dev.mysql.com/doc/mysql/en/String_functions.html. Removing char() makes the query appear to work. I say appear because you forgot the same thing I did. We need to guard against the possibility that a row with different content will match the location and date. So, your query becomes SELECT content, location, version, date FROM temp WHERE CONCAT(location, date) IN (SELECT CONCAT(location, MAX(date)) FROM temp WHERE content = 'ALPHA' GROUP BY location) AND content = 'ALPHA'; which works for me in 4.1.4a. That said, it seems to me that this is a version of the MAX-CONCAT trick written as a subquery. With the columns to be compared inside CONCAT() functions, there is no way to use an index to match up the rows. My expectation is that this will be relatively inefficient compared to the other subquery solution. Michael Rhino wrote: Here is the output I got when I ran my query - and yours, Michael - against DB2 V7.2. Please note that I replaced the 'temp' table in Michael's query with the real table in both the outer query and the subquery; no other changes were made. -- create table versions (id smallint not null, date date not null, content char(5) not null, location char(10) not null, version smallint not null, primary key(id)) DB2I The SQL command completed successfully. insert into versions values (1, '2004-09-14', 'ALPHA', 'PARIS', 10), (2, '2004-09-15', 'ALPHA', 'PARIS', 11), (3, '2004-09-16', 'ALPHA', 'PARIS', 10), (4, '2004-09-14', 'ALPHA', 'NEW-YORK', 11), (5, '2004-09-15', 'ALPHA', 'NEW-YORK', 11), (6, '2004-09-16', 'ALPHA', 'NEW-YORK', 10), (7, '2004-09-14', 'ALPHA', 'TOKYO', 10), (8, '2004-09-15', 'ALPHA', 'TOKYO', 11), (9, '2004-09-16', 'BETA', 'TOKYO', 10) DB2I The SQL command completed successfully. select content, location, version, date from versions where concat(location, char(date)) in (select concat(location, char(max(date))) from versions where content = 'ALPHA' group by location) CONTENT LOCATION VERSION DATE --- -- --- -- ALPHA NEW-YORK10 09/16/2004 ALPHA PARIS 10 09/16/2004 ALPHA TOKYO 11 09/15/2004 3 record(s) selected. SELECT content, location, version, date FROM versions t1 WHERE date=(SELECT MAX(t2.date) FROM versions t2 WHERE t1.location = t2.location AND t1.content = t2.content) AND content = 'ALPHA' CONTENT LOCATION VERSION DATE --- -- --- -- ALPHA PARIS 10 09/16/2004 ALPHA NEW-YORK10 09/16/2004 ALPHA TOKYO
Re: Query with group by
- Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, September 16, 2004 3:05 AM Subject: Query with group by Hello all, i'm trying to built a query that report me the latest tuple for a given field. I explain myself a bit. Let's say i have a table such this : +++-+--+-+ | id | date | content | location | version | +++-+--+-+ | 1 | 2004-09-14 | ALPHA | PARIS| 10 | | 2 | 2004-09-15 | ALPHA | PARIS| 11 | | 3 | 2004-09-16 | ALPHA | PARIS| 10 | | 4 | 2004-09-14 | ALPHA | NEW-YORK | 11 | | 5 | 2004-09-15 | ALPHA | NEW-YORK | 11 | | 6 | 2004-09-16 | ALPHA | NEW-YORK | 10 | | 7 | 2004-09-14 | ALPHA | TOKYO| 10 | | 8 | 2004-09-15 | ALPHA | TOKYO| 11 | | 9 | 2004-09-16 | BETA| TOKYO| 10 | +++-+--+-+ Then, i'm trying to get, for ALPHA content, the last (most recent) tuple for each location, with their associated version. What i should have in the result set : ++-+--+-+ | 2004-09-16 | ALPHA | PARIS| 10 | | 2004-09-16 | ALPHA | NEW-YORK | 10 | | 2004-09-15 | ALPHA | TOKYO| 11 | ++-+--+-+ I tried with max(date) but i get this : mysql select max(date), location, version from temp where content=ALPHA group by location; ++--+-+ | max(date) | location | version | ++--+-+ | 2004-09-16 | NEW-YORK | 11 | | 2004-09-16 | PARIS| 10 | | 2004-09-15 | TOKYO| 10 | ++--+-+ Because the GROUP BY statement get the first tuple by default? distinct(max(date)) do the same result. How them can i get the correct result set? Note that we cannot guess if the version is increasing or deacreasing. Which version of MySQL are you running? I'm having trouble thinking of a solution that doesn't involve a subquery but subqueries aren't supported until version 4.1.x; I don't want to give you a subquery if you can't run it. Rhino -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query with group by
Hi Vincent! i'm trying to built a query that report me the latest tuple for a given field. I explain myself a bit. Let's say i have a table such this : There is no simple solution to your problem. Strictly speaking, it's forbidden to name a column in the SELECT clause which is NOT on the GROUP BY clause. (Read chapter: 13.9.3 GROUP BY with Hidden Fields from the MySQL Documentation.). Anyway, MySQL is relaxed from ANSI SQL in such a way it's permitted to include a column which is not in the GROUP BY clause (hidden columns). The values returned from such a query are unpredictable. I started saying there is no _simple_ solution. However, you can think a little bit and your problem goes away. I can think of two solutions to your problem which neither involves a subquery. You are lucky in that you know for sure what your last value is (you do have a date field). You already know the value of using MAX(date) in the SELECT. You can concatenate the date column with the version, and return a SUBSTRING from that operation, something like this: SELECT SUBSTRING_INDEX(MAX(CONCAT(date, ' ', version)), ' ', -1) AS correct_version, location, version AS wrong_version FROM cities WHERE content='ALPHA' GROUP BY location You will get the following result set: (Notice the correct_version column). +-+--+---+ | correct_version | location | wrong_version | +-+--+---+ | 10 | NEW-YORK |11 | | 10 | PARIS|10 | | 11 | TOKYO|10 | +-+--+---+ There is another option, however, and this one is what I like most. In two words, you can join the table with itself, like this: SELECT c1.date, c1.location, c1.version FROM cities c1 LEFT JOIN cities c2 ON c1.location=c2.location AND c1.content=c2.content AND c2.datec1.date WHERE c2.id IS NULL AND c1.content = 'ALPHA' What we have done here is first LEFT JOIN the table (cities in my example) with itself so that we have in the left part those rows which have the maximum date and with NULL on the right as the value of c2. We then select those saying WHERE c2.id IS NULL. The last operation is selecting the rows for an 'ALPHA' content. Notice that with this version we also eliminate the GROUP BY, since all rows returned from the join are unique. The result set returned with the last example is this: ++--+-+ | date | location | version | ++--+-+ | 2004-09-16 | PARIS| 10 | | 2004-09-16 | NEW-YORK | 10 | | 2004-09-15 | TOKYO| 11 | ++--+-+ Cheers, Jose Miguel. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query with group by
Jose Miguel Pérez wrote: Hi Vincent! i'm trying to built a query that report me the latest tuple for a given field. I explain myself a bit. Let's say i have a table such this : There is no simple solution to your problem. Strictly speaking, it's forbidden to name a column in the SELECT clause which is NOT on the GROUP BY clause. (Read chapter: 13.9.3 GROUP BY with Hidden Fields from the MySQL Documentation.). Anyway, MySQL is relaxed from ANSI SQL in such a way it's permitted to include a column which is not in the GROUP BY clause (hidden columns). The values returned from such a query are unpredictable. I started saying there is no _simple_ solution. However, you can think a little bit and your problem goes away. I can think of two solutions to your problem which neither involves a subquery. You are lucky in that you know for sure what your last value is (you do have a date field). You already know the value of using MAX(date) in the SELECT. You can concatenate the date column with the version, and return a SUBSTRING from that operation, something like this: SELECT SUBSTRING_INDEX(MAX(CONCAT(date, ' ', version)), ' ', -1) AS correct_version, location, version AS wrong_version FROM cities WHERE content='ALPHA' GROUP BY location This is the MAX-CONCAT trick. It works, but it's inefficient, as it has to do a full table scan, with calculations done on each row. An index on date cannot be used in this case, because we are searching for the maximum CONCAT(date, ' ', version), rather than the maximum date. You will get the following result set: (Notice the correct_version column). +-+--+---+ | correct_version | location | wrong_version | +-+--+---+ | 10 | NEW-YORK |11 | | 10 | PARIS|10 | | 11 | TOKYO|10 | +-+--+---+ There is another option, however, and this one is what I like most. In two words, you can join the table with itself, like this: SELECT c1.date, c1.location, c1.version FROM cities c1 LEFT JOIN cities c2 ON c1.location=c2.location AND c1.content=c2.content AND c2.datec1.date WHERE c2.id IS NULL AND c1.content = 'ALPHA' What we have done here is first LEFT JOIN the table (cities in my example) with itself so that we have in the left part those rows which have the maximum date and with NULL on the right as the value of c2. We then select those saying WHERE c2.id IS NULL. The last operation is selecting the rows for an 'ALPHA' content. This will work, but it is also somewhat inefficient. The LEFT JOIN is creating numerous extra, unwanted rows, only to throw them away with the WHERE c2.id IS NULL. Assuming n rows for a particular location value, you are creating 1 + (n * (n - 1)/2) rows {optimized down to n rows, if id is defined NOT NULL} to find the one row you want for that group. That's no big deal for this small sample table, but it may not scale well. Notice that with this version we also eliminate the GROUP BY, since all rows returned from the join are unique. The result set returned with the last example is this: ++--+-+ | date | location | version | ++--+-+ | 2004-09-16 | PARIS| 10 | | 2004-09-16 | NEW-YORK | 10 | | 2004-09-15 | TOKYO| 11 | ++--+-+ The most efficient way is probably to use a temporary table. CREATE TEMPORARY TABLE max_dates SELECT location, MAX(date) AS max_date FROM temp WHERE content = 'ALPHA' GROUP BY location; SELECT t.* FROM temp t, max_dates m WHERE t.location = m.location AND t.date = m.max_date; DROP TABLE max_dates; The manual describes the MAX-CONCAT trick, the temporary table solution, and a subquery solution for 4.1+ http://dev.mysql.com/doc/mysql/en/example-Maximum-column-group-row.html. Cheers, Jose Miguel. Michael -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query with group by
I agree that Michael's solution with the temporary tables is the best I have seen so far. I am running MySQL 4.0.15 so I don't have any subquery capability. As a result, I went to DB2 to come up with one solution that does give the right answer via a subquery: select content, location, version, date from versions where concat(location, char(date)) in (select concat(location, char(max(date))) from versions where content = 'ALPHA' group by location); I don't know if this will work in MySQL 4.1.x though. You may want to give it a try if you have 4.1.x. By the way, I'm not convinced that this is the *best* solution using a subquery; it's just the first one I could think of. I don't have all day to spend on this ;-) One other thought: is it really necessary to keep track of all of the different versions you have had at each location? I can't think of a lot of cases where I'd really care that I had version 8 of the ALPHA in Tokyo last week or last year. Your query would be simpler - and the volume of data would be less - if you only kept track of the current version for each content at each location. Rhino - Original Message - From: Michael Stassen [EMAIL PROTECTED] To: Jose Miguel Pérez [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Thursday, September 16, 2004 10:47 AM Subject: Re: Query with group by Jose Miguel Pérez wrote: Hi Vincent! i'm trying to built a query that report me the latest tuple for a given field. I explain myself a bit. Let's say i have a table such this : There is no simple solution to your problem. Strictly speaking, it's forbidden to name a column in the SELECT clause which is NOT on the GROUP BY clause. (Read chapter: 13.9.3 GROUP BY with Hidden Fields from the MySQL Documentation.). Anyway, MySQL is relaxed from ANSI SQL in such a way it's permitted to include a column which is not in the GROUP BY clause (hidden columns). The values returned from such a query are unpredictable. I started saying there is no _simple_ solution. However, you can think a little bit and your problem goes away. I can think of two solutions to your problem which neither involves a subquery. You are lucky in that you know for sure what your last value is (you do have a date field). You already know the value of using MAX(date) in the SELECT. You can concatenate the date column with the version, and return a SUBSTRING from that operation, something like this: SELECT SUBSTRING_INDEX(MAX(CONCAT(date, ' ', version)), ' ', -1) AS correct_version, location, version AS wrong_version FROM cities WHERE content='ALPHA' GROUP BY location This is the MAX-CONCAT trick. It works, but it's inefficient, as it has to do a full table scan, with calculations done on each row. An index on date cannot be used in this case, because we are searching for the maximum CONCAT(date, ' ', version), rather than the maximum date. You will get the following result set: (Notice the correct_version column). +-+--+---+ | correct_version | location | wrong_version | +-+--+---+ | 10 | NEW-YORK |11 | | 10 | PARIS|10 | | 11 | TOKYO|10 | +-+--+---+ There is another option, however, and this one is what I like most. In two words, you can join the table with itself, like this: SELECT c1.date, c1.location, c1.version FROM cities c1 LEFT JOIN cities c2 ON c1.location=c2.location AND c1.content=c2.content AND c2.datec1.date WHERE c2.id IS NULL AND c1.content = 'ALPHA' What we have done here is first LEFT JOIN the table (cities in my example) with itself so that we have in the left part those rows which have the maximum date and with NULL on the right as the value of c2. We then select those saying WHERE c2.id IS NULL. The last operation is selecting the rows for an 'ALPHA' content. This will work, but it is also somewhat inefficient. The LEFT JOIN is creating numerous extra, unwanted rows, only to throw them away with the WHERE c2.id IS NULL. Assuming n rows for a particular location value, you are creating 1 + (n * (n - 1)/2) rows {optimized down to n rows, if id is defined NOT NULL} to find the one row you want for that group. That's no big deal for this small sample table, but it may not scale well. Notice that with this version we also eliminate the GROUP BY, since all rows returned from the join are unique. The result set returned with the last example is this: ++--+-+ | date | location | version | ++--+-+ | 2004-09-16 | PARIS| 10 | | 2004-09-16 | NEW-YORK | 10 | | 2004-09-15
Re: Query with group by
Which version of MySQL are you running? I'm having trouble thinking of a solution that doesn't involve a subquery but subqueries aren't supported until version 4.1.x; I don't want to give you a subquery if you can't run it. Rhino I'm running 3.23.43. -- Vincent -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query with group by
Hi Michael! Yes, you're right, thanks for extending and clarifying my message. However, I'm not confident about your comments on the inefficiency for the following query: SELECT c1.date, c1.location, c1.version FROM cities c1 LEFT JOIN cities c2 ON c1.location=c2.location AND c1.content=c2.content AND c2.datec1.date WHERE c2.id IS NULL AND c1.content = 'ALPHA' This will work, but it is also somewhat inefficient. The LEFT JOIN is creating numerous extra, unwanted rows, only to throw them away with the WHERE c2.id IS NULL. Assuming n rows for a particular location value, you [...] and then [...] The most efficient way is probably to use a temporary table. CREATE TEMPORARY TABLE max_dates SELECT location, MAX(date) AS max_date FROM temp WHERE content = 'ALPHA' GROUP BY location; SELECT t.* FROM temp t, max_dates m WHERE t.location = m.location AND t.date = m.max_date; DROP TABLE max_dates; I don't think the temporary table is such an efficient way of doing this. Pardon me, I'm provably wrong, but let me explain to see if I think correctly. First, I assume as true this table have an index on location, content and date, apart from the PK on ID. Given that, on my query we are using the keys at full, I mean, although you say the left join is creating numerous extra, unwanted rows, this is not true. We could apply the standard algebra here, but the real world query optimizers are smart enough to not retrieve unwanted data. (What about joining four or more tables! Multiply then). Your query is creating a temporary table, doing a full scan of it (thanks to the MAX(date) function), etc. If you do a EXPLAIN SELECT for your query, you'll notice there is an Extra of: Using where; Using temporary; Using FILESORT. Reading the MySQL documentation, one can see If you want to make your queries as fast as possible, you should look out for Extra values of Using filesort and Using temporary.. (Chapter 7.2.1 EXPLAIN Syntax). If I'm not wrong, maybe the first LEFT JOIN is worse from a mathematical point of view, but the temporary one may be is the worst from a practical perspective. And you'll see I'm very cautious because I'm not such a SQL guru, but I'd like to know other opinions. Anyway, I don't know if one can program an agregate UDF called something like EXTERNAL_MAX(...) or something, so that we could do like: SELECT EXTERNAL_MAX(date, version) --- i.e: Returns the version value for the row with MAX(date). This, for sure, will be the best solution. ;-) Cheers, Jose Miguel. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Query with group by
Rhino wrote: I agree that Michael's solution with the temporary tables is the best I have seen so far. I can't take much credit. It's just an adaptation of the solution in the manual. I am running MySQL 4.0.15 so I don't have any subquery capability. As a result, I went to DB2 to come up with one solution that does give the right answer via a subquery: select content, location, version, date from versions where concat(location, char(date)) in (select concat(location, char(max(date))) from versions where content = 'ALPHA' group by location); Really? That didn't work for me in mysql 4.1.4a-gamma. I got +-+--+-++ | content | location | version | date | +-+--+-++ | ALPHA | PARIS| 10 | 2004-09-14 | | ALPHA | PARIS| 11 | 2004-09-15 | | ALPHA | PARIS| 10 | 2004-09-16 | | ALPHA | NEW-YORK | 11 | 2004-09-14 | | ALPHA | NEW-YORK | 11 | 2004-09-15 | | ALPHA | NEW-YORK | 10 | 2004-09-16 | | ALPHA | TOKYO| 10 | 2004-09-14 | | ALPHA | TOKYO| 11 | 2004-09-15 | | BETA| TOKYO| 10 | 2004-09-16 | +-+--+-++ 9 rows in set (0.31 sec) I don't know if this will work in MySQL 4.1.x though. You may want to give it a try if you have 4.1.x. By the way, I'm not convinced that this is the *best* solution using a subquery; it's just the first one I could think of. I don't have all day to spend on this ;-) The following query works for me. SELECT content, location, version, date FROM temp t1 WHERE date=(SELECT MAX(t2.date) FROM temp t2 WHERE t1.location = t2.location AND t1.content = t2.content) AND content = 'ALPHA'; +-+--+-++ | content | location | version | date | +-+--+-++ | ALPHA | PARIS| 10 | 2004-09-16 | | ALPHA | NEW-YORK | 10 | 2004-09-16 | | ALPHA | TOKYO| 11 | 2004-09-15 | +-+--+-++ 3 rows in set (0.01 sec) Again, that's an adaptation of the subquery version of the solution in the manual. Michael -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: QUERY question (group by) - please help
At 14:41 +0100 1/10/03, Damir Dezeljin wrote: Hi. I want to calculate how many rows my query which uses 'GROUP BY' returns. The query: SELECT something FROM test WHERE (kid=1) OR (kid=2) OR (kid=4) GROUP BY cid,aid; Is it posible to get number of rows with such a query from MySQL v3.23.49? You want both a list of rows, and a count of the number of rows -- with a single query. That doesn't make sense, because those two things are incompatible. If it isn't posible ... is it posible in MySQL 4.x? MySQL 4.x won't change the fundamental nature of the incompatibility. The strategy you use below is reasonable. I think a lot about this problem and I realize only the following solution: CREATE TEMPORARY TABLE t (i INT); INSERT INTO t (i) SELECT aid FROM the_query_above SELECT COUNT(*) FROM t; DROP TABLE t; But this isn't so elegant. I want to do so on data generated by: CREATE TABLE test ( kid INT, aid INT, cid INT ); INSERT INTO test (kid, aid, cid) VALUES ( 1, 0, 1), ( 2, 2, 2), ( 1, 3, 2), ( 2, 3, 2), ( 4, 4, 2), ( 4, 0, 3), ( 3, 3, 4), ( 4, 3, 4); Regards, Dezo - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: QUERY question (group by) - please help
SELECT something FROM test WHERE (kid=1) OR (kid=2) OR (kid=4) GROUP BY cid,aid; Is it posible to get number of rows with such a query from MySQL v3.23.49? You want both a list of rows, and a count of the number of rows -- with a single query. That doesn't make sense, because those two things are incompatible. I make a mistake by not specifing my problem in more detail. I will try to be more specific: I'm writing my own search engine for my web site. I want to output for each result on each page (20 results per page) the no_of_current_result / no_of_total_results . So with a first query I want to get a count of the number of rows. Then I will execute a query with LIMIT blahblah to get results for the current page. Is it posible to get number of all results without geting all results in a temporary table and then count them? If yes, how? Thanks and regards, Dezo - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: QUERY question (group by) - please help
Damir Dezeljin wrote: SELECT something FROM test WHERE (kid=1) OR (kid=2) OR (kid=4) GROUP BY cid,aid; Is it posible to get number of rows with such a query from MySQL v3.23.49? You want both a list of rows, and a count of the number of rows -- with a single query. That doesn't make sense, because those two things are incompatible. I make a mistake by not specifing my problem in more detail. I will try to be more specific: I'm writing my own search engine for my web site. I want to output for each result on each page (20 results per page) the no_of_current_result / no_of_total_results . So with a first query I want to get a count of the number of rows. Then I will execute a query with LIMIT blahblah to get results for the current page. Is it posible to get number of all results without geting all results in a temporary table and then count them? If yes, how? SELECT COUNT(*) FROM test WHERE (kid=1) OR (kid=2) OR (kid=4) GROUP BY cid,aid; Should do it, or am I missing something ? Thanks and regards, Dezo Hope this Helps Joseph Bueno - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: QUERY question (group by) - please help
Hi again ;) SELECT COUNT(*) FROM test WHERE (kid=1) OR (kid=2) OR (kid=4) GROUP BY cid,aid; This query returns: +--+ | COUNT(*) | +--+ |1 | |1 | |2 | |1 | |1 | |1 | +--+ 6 rows in set (0.03 sec) Inest of this I want to get back '6' so numbers of rows in resulting query; The same query WITHOUT 'GROUP BY' returns: mysql SELECT COUNT(*) FROM test WHERE (kid=1) OR (kid=2) OR (kid=4); +--+ | COUNT(*) | +--+ |7 | +--+ 1 row in set (0.00 sec) So two rows are grouped because of same cid,did Any sugestion? Regards, Dezo - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: QUERY question (group by) - please help
Damir Dezeljin wrote: Hi again ;) SELECT COUNT(*) FROM test WHERE (kid=1) OR (kid=2) OR (kid=4) GROUP BY cid,aid; This query returns: +--+ | COUNT(*) | +--+ |1 | |1 | |2 | |1 | |1 | |1 | +--+ 6 rows in set (0.03 sec) Inest of this I want to get back '6' so numbers of rows in resulting query; The same query WITHOUT 'GROUP BY' returns: mysql SELECT COUNT(*) FROM test WHERE (kid=1) OR (kid=2) OR (kid=4); +--+ | COUNT(*) | +--+ |7 | +--+ 1 row in set (0.00 sec) So two rows are grouped because of same cid,did Any sugestion? Regards, Dezo SELECT COUNT(DISTINCT cid,aid) FROM test WHERE (kid=1) OR (kid=2) OR (kid=4) Should give you the number of groups. Hope this helps Joseph Bueno - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php