Mark Phillips <[EMAIL PROTECTED]> wrote on 12/14/2005 11:31:03 AM: > I am using MySQL 4.0.x on a Linux machine with a JSP/Servlet front-end to > display the data. > > I have a table with experimental data for each flight of a rocket. > Conceptually, it looks like (with many more columns): > > Flights > +-----------+----------+----------+ > | flight_id | data1_id | data2_id | > +-----------+----------+----------+ > | 1 | 1 | 1 | > | 2 | 1 | 3 | > | 3 | 1 | 1 | > | 4 | 2 | 2 | > | 5 | 2 | 3 | > | 6 | 1 | 1 | > | 7 | 1 | 1 | > | 8 | 4 | 4 | > | 9 | 1 | 2 | > | 10 | 1 | 2 | > | 11 | 1 | 1 | > +-----------+----------+----------+ > > The data1_id and data2_id are indexes for the data recorded for that flight. > > I want to summarize the data. One such summary is to count the number of
> different data1_id's and data2_id's. For example: > > Flight Result Summary > index: 1 2 3 4 > data1_id 8 2 0 1 > data2_id 5 3 2 1 > > I can think of 2 ways to make this summary table. > > 1. Issue 4 queries per data_id of the form > SELECT COUNT(flight_id) FROM Flights WHERE data1_id=** > where ** is set to the values 1,2,3,4. For the table above, I would have to > issue a total of 8 queries. > > 2. Issue one query of the form > SELECT flight_id FROM Flights > and do the counting in my Java code. A simple loop through the > ResultSet could > count the different values for the data_ids. > > My questions are: > > 1. Is there a better way than these two options for getting the dataI want? A > single query per data_id? > > 2. Generally, what is the most "efficient" way to do this? Is is better to > issue more queries that gather the "calculated data" or better to issue one > query for the raw data and then do the calculations in Java? I am sure there > are many factors that effect the answer to this question - server resources, > code design, etc. However, I am interested in a best practices type of answer > or general rule of thumb from the sage experts on the list. > > Thanks for any insights you can provide! > > -- > Mark Phillips > Phillips Marketing, Inc > [EMAIL PROTECTED] > 602 524-0376 > 480 945-9197 fax > Your option 1) may experience network lag for each query/result cycle, depending on how you connect. If you have a decent index, each query will be very quick so that's not necessarily going to be much of an issue. If you have a fast connection that becomes less of an issue, too. Your option 2) could turn out to be very quick, it all depends on how efficiently you can code your "pivot" routine on the client side. I thought this was going to be a simple "pivot table" until I looked again. You are actually pivoting your data twice: Once around the flight_id to put your column headers as the row headers, and the second time to convert discreet column values into column headers. A single pivot can be rather quick under most circumstances but this double pivot would be a rather ungainly SQL statement and would not actually save you much effort (unless you automated its production in your application's code). It's a fairly easy pattern to write but by the time you wrote the query and executed it, you could have taken the raw data and transformed it just as easily using your option 2). This is one of those situations where the data transformation is best left to application-layer code (using loops and arrays) than it would be to try to create a SQL statement to do it at the server. IMHO, Stick with 2). Shawn Green Database Administrator Unimin Corporation - Spruce Pine