Re: YAQQ (Yet Another Query Question)
Mark Phillips wrote: Flights +---+--+--+ | flight_id | data1_id | data2_id | +---+--+--+ | 1 |1 |1 | | 2 |1 |3 | | 3 |1 |1 | | 4 |2 |2 | | 5 |2 |3 | | 6 |1 |1 | | 7 |1 |1 | | 8 |4 |4 | | 9 |1 |2 | |10 |1 |2 | |11 |1 |1 | +---+--+--+ The data1_id and data2_id are indexes for the data recorded for that flight. I want to summarize the data. One such summary is to count the number of different data1_id's and data2_id's. For example: Flight Result Summary index: 1 2 3 4 data1_id8 2 0 1 data2_id5 3 2 1 select sum(if(data1_id =1,1, 0)) as data1_id_1, sum(if(data1_id =2, 1, 0)) as data1_id_2, etc , etc sum(if(data2_id =1,1, 0)) as data2_id_1, sum(if(data2_id =2, 1, 0)) as data2_id_2 etc, etc from flights add composite indexes if required for speed. Nigel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: YAQQ (Yet Another Query Question)
Mark Phillips wrote: 2. Generally, what is the most efficient way to do this? Is is better to issue more queries that gather the calculated data or better to issue one query for the raw data and then do the calculations in Java? I am sure there are many factors that effect the answer to this question - server resources, code design, etc. However, I am interested in a best practices type of answer or general rule of thumb from the sage experts on the list. Sorry only just spotted the second half. Processing in MySQL will be faster than pulling the dataset back and processing it. This is particularly true if the database server is remote from the servlet container. The chief reason is that processing it on the client add the time needed to copy the raw data over the network. In Java or C.* data processing performance can be on a par with MySQL once the data is obtained, against an interpreted language such as PHP or Perl the database's performance will always win hands down even if temporary tables are needed. If the rocket data doesn't change rapidly the MySQL query cache will also improve preformance. This feature speeds things by remembering the answer to your query and replying with a cached version until the rockets table is next updated. Nigel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: YAQQ (Yet Another Query Question)
Mark Phillips [EMAIL PROTECTED] wrote on 12/14/2005 11:31:03 AM: I am using MySQL 4.0.x on a Linux machine with a JSP/Servlet front-end to display the data. I have a table with experimental data for each flight of a rocket. Conceptually, it looks like (with many more columns): Flights +---+--+--+ | flight_id | data1_id | data2_id | +---+--+--+ | 1 |1 |1 | | 2 |1 |3 | | 3 |1 |1 | | 4 |2 |2 | | 5 |2 |3 | | 6 |1 |1 | | 7 |1 |1 | | 8 |4 |4 | | 9 |1 |2 | |10 |1 |2 | |11 |1 |1 | +---+--+--+ The data1_id and data2_id are indexes for the data recorded for that flight. I want to summarize the data. One such summary is to count the number of different data1_id's and data2_id's. For example: Flight Result Summary index: 1 2 3 4 data1_id 8 2 0 1 data2_id 5 3 2 1 I can think of 2 ways to make this summary table. 1. Issue 4 queries per data_id of the form SELECT COUNT(flight_id) FROM Flights WHERE data1_id=** where ** is set to the values 1,2,3,4. For the table above, I would have to issue a total of 8 queries. 2. Issue one query of the form SELECT flight_id FROM Flights and do the counting in my Java code. A simple loop through the ResultSet could count the different values for the data_ids. My questions are: 1. Is there a better way than these two options for getting the dataI want? A single query per data_id? 2. Generally, what is the most efficient way to do this? Is is better to issue more queries that gather the calculated data or better to issue one query for the raw data and then do the calculations in Java? I am sure there are many factors that effect the answer to this question - server resources, code design, etc. However, I am interested in a best practices type of answer or general rule of thumb from the sage experts on the list. Thanks for any insights you can provide! -- Mark Phillips Phillips Marketing, Inc [EMAIL PROTECTED] 602 524-0376 480 945-9197 fax Your option 1) may experience network lag for each query/result cycle, depending on how you connect. If you have a decent index, each query will be very quick so that's not necessarily going to be much of an issue. If you have a fast connection that becomes less of an issue, too. Your option 2) could turn out to be very quick, it all depends on how efficiently you can code your pivot routine on the client side. I thought this was going to be a simple pivot table until I looked again. You are actually pivoting your data twice: Once around the flight_id to put your column headers as the row headers, and the second time to convert discreet column values into column headers. A single pivot can be rather quick under most circumstances but this double pivot would be a rather ungainly SQL statement and would not actually save you much effort (unless you automated its production in your application's code). It's a fairly easy pattern to write but by the time you wrote the query and executed it, you could have taken the raw data and transformed it just as easily using your option 2). This is one of those situations where the data transformation is best left to application-layer code (using loops and arrays) than it would be to try to create a SQL statement to do it at the server. IMHO, Stick with 2). Shawn Green Database Administrator Unimin Corporation - Spruce Pine
RE: YAQQ (Yet Another Query Question)
Have you tried the GROUP BY? Make something like (not sure of exact syntax, check the manual for that): SELECT COUNT(*) AS cnt, data1_id FROM data1_id GROUP BY data1_iD; /Peter -Original Message- From: Mark Phillips [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 14, 2005 11:31 PM To: MYSQL List Subject: YAQQ (Yet Another Query Question) I am using MySQL 4.0.x on a Linux machine with a JSP/Servlet front-end to display the data. I have a table with experimental data for each flight of a rocket. Conceptually, it looks like (with many more columns): Flights +---+--+--+ | flight_id | data1_id | data2_id | +---+--+--+ | 1 |1 |1 | | 2 |1 |3 | | 3 |1 |1 | | 4 |2 |2 | | 5 |2 |3 | | 6 |1 |1 | | 7 |1 |1 | | 8 |4 |4 | | 9 |1 |2 | |10 |1 |2 | |11 |1 |1 | +---+--+--+ The data1_id and data2_id are indexes for the data recorded for that flight. I want to summarize the data. One such summary is to count the number of different data1_id's and data2_id's. For example: Flight Result Summary index: 1 2 3 4 data1_id8 2 0 1 data2_id5 3 2 1 I can think of 2 ways to make this summary table. 1. Issue 4 queries per data_id of the form SELECT COUNT(flight_id) FROM Flights WHERE data1_id=** where ** is set to the values 1,2,3,4. For the table above, I would have to issue a total of 8 queries. 2. Issue one query of the form SELECT flight_id FROM Flights and do the counting in my Java code. A simple loop through the ResultSet could count the different values for the data_ids. My questions are: 1. Is there a better way than these two options for getting the data I want? A single query per data_id? 2. Generally, what is the most efficient way to do this? Is is better to issue more queries that gather the calculated data or better to issue one query for the raw data and then do the calculations in Java? I am sure there are many factors that effect the answer to this question - server resources, code design, etc. However, I am interested in a best practices type of answer or general rule of thumb from the sage experts on the list. Thanks for any insights you can provide! -- Mark Phillips Phillips Marketing, Inc [EMAIL PROTECTED] 602 524-0376 480 945-9197 fax -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: YAQQ (Yet Another Query Question)
Nigel, Thanks! Mark On Wednesday 14 December 2005 09:42 am, nigel wood wrote: Mark Phillips wrote: Flights +---+--+--+ | flight_id | data1_id | data2_id | +---+--+--+ | 1 |1 |1 | | 2 |1 |3 | | 3 |1 |1 | | 4 |2 |2 | | 5 |2 |3 | | 6 |1 |1 | | 7 |1 |1 | | 8 |4 |4 | | 9 |1 |2 | |10 |1 |2 | |11 |1 |1 | +---+--+--+ The data1_id and data2_id are indexes for the data recorded for that flight. I want to summarize the data. One such summary is to count the number of different data1_id's and data2_id's. For example: Flight Result Summary index: 1 2 3 4 data1_id 8 2 0 1 data2_id 5 3 2 1 select sum(if(data1_id =1,1, 0)) as data1_id_1, sum(if(data1_id =2, 1, 0)) as data1_id_2, etc , etc sum(if(data2_id =1,1, 0)) as data2_id_1, sum(if(data2_id =2, 1, 0)) as data2_id_2 etc, etc from flights add composite indexes if required for speed. Nigel -- Mark Phillips [EMAIL PROTECTED] 602 524-0376 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: YAQQ (Yet Another Query Question)
Nigel, Again, thanks - that is the rule of thumb I was looking for! Mark On Wednesday 14 December 2005 09:57 am, nigel wood wrote: Mark Phillips wrote: 2. Generally, what is the most efficient way to do this? Is is better to issue more queries that gather the calculated data or better to issue one query for the raw data and then do the calculations in Java? I am sure there are many factors that effect the answer to this question - server resources, code design, etc. However, I am interested in a best practices type of answer or general rule of thumb from the sage experts on the list. Sorry only just spotted the second half. Processing in MySQL will be faster than pulling the dataset back and processing it. This is particularly true if the database server is remote from the servlet container. The chief reason is that processing it on the client add the time needed to copy the raw data over the network. In Java or C.* data processing performance can be on a par with MySQL once the data is obtained, against an interpreted language such as PHP or Perl the database's performance will always win hands down even if temporary tables are needed. If the rocket data doesn't change rapidly the MySQL query cache will also improve preformance. This feature speeds things by remembering the answer to your query and replying with a cached version until the rockets table is next updated. Nigel -- Mark Phillips [EMAIL PROTECTED] 602 524-0376 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: YAQQ (Yet Another Query Question)
Thanks to everyone for their help. Using Nigel's suggestion, I was able to gather all the summary data in one query. Those nested if()'s are really useful! FWIW, you can see the summary stats at http://rockets.phillipsoasis.com Just click on Hopi Rockets and scroll to the bottom of the page. My small contribution to science education! This list is great! Mark On Wednesday 14 December 2005 09:42 am, nigel wood wrote: Mark Phillips wrote: Flights +---+--+--+ | flight_id | data1_id | data2_id | +---+--+--+ | 1 |1 |1 | | 2 |1 |3 | | 3 |1 |1 | | 4 |2 |2 | | 5 |2 |3 | | 6 |1 |1 | | 7 |1 |1 | | 8 |4 |4 | | 9 |1 |2 | |10 |1 |2 | |11 |1 |1 | +---+--+--+ The data1_id and data2_id are indexes for the data recorded for that flight. I want to summarize the data. One such summary is to count the number of different data1_id's and data2_id's. For example: Flight Result Summary index: 1 2 3 4 data1_id 8 2 0 1 data2_id 5 3 2 1 select sum(if(data1_id =1,1, 0)) as data1_id_1, sum(if(data1_id =2, 1, 0)) as data1_id_2, etc , etc sum(if(data2_id =1,1, 0)) as data2_id_1, sum(if(data2_id =2, 1, 0)) as data2_id_2 etc, etc from flights add composite indexes if required for speed. Nigel -- Mark Phillips [EMAIL PROTECTED] 602 524-0376 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]