Mark Phillips <[EMAIL PROTECTED]> wrote on 12/14/2005 11:31:03 
AM:

> I am using MySQL 4.0.x on a Linux machine with a JSP/Servlet front-end 
to 
> display the data.
> 
> I have a table with experimental data for each flight of a rocket. 
> Conceptually, it looks like (with many more columns):
> 
> Flights
> +-----------+----------+----------+
> | flight_id | data1_id | data2_id |
> +-----------+----------+----------+
> |         1 |        1 |        1 |
> |         2 |        1 |        3 |
> |         3 |        1 |        1 |
> |         4 |        2 |        2 |
> |         5 |        2 |        3 |
> |         6 |        1 |        1 |
> |         7 |        1 |        1 |
> |         8 |        4 |        4 |
> |         9 |        1 |        2 |
> |        10 |        1 |        2 |
> |        11 |        1 |        1 |
> +-----------+----------+----------+
> 
> The data1_id and data2_id are indexes for the data recorded for that 
flight.
> 
> I want to summarize the data. One such summary is to count the number of 

> different data1_id's and data2_id's. For example:
> 
> Flight Result Summary
>        index:   1   2   3   4
> data1_id      8   2   0   1
> data2_id      5   3   2   1
> 
> I can think of 2 ways to make this summary table.
> 
> 1. Issue 4 queries per data_id of the form 
> SELECT COUNT(flight_id) FROM Flights WHERE data1_id=**
> where ** is set to the values 1,2,3,4. For the table above, I would have 
to 
> issue a total of 8 queries.
> 
> 2. Issue one query of the form
> SELECT flight_id FROM Flights
> and do the counting in my Java code. A simple loop through the 
> ResultSet could 
> count the different values for the data_ids.
> 
> My questions are:
> 
> 1. Is there a better way than these two options for getting the dataI 
want? A 
> single query per data_id? 
> 
> 2. Generally, what is the most "efficient" way to do this? Is is better 
to 
> issue more queries that gather the "calculated data" or better to issue 
one 
> query for the raw data and then do the calculations in Java? I am sure 
there 
> are many factors that effect the answer to this question - server 
resources, 
> code design, etc. However, I am interested in a best practices type of 
answer 
> or general rule of thumb from the sage experts on the list. 
> 
> Thanks for any insights you can provide!
> 
> -- 
> Mark Phillips
> Phillips Marketing, Inc
> [EMAIL PROTECTED]
> 602 524-0376
> 480 945-9197 fax
> 

Your option 1) may experience network lag for each query/result cycle, 
depending on how you connect. If you have a decent index, each query will 
be very quick so that's not necessarily going to be much of an issue. If 
you have a fast connection that becomes less of an issue, too. 

Your option 2) could turn out to be very quick, it all depends on how 
efficiently you can code your "pivot" routine on the client side.

I thought this was going to be a simple "pivot table" until I looked 
again. You are actually pivoting your data twice: Once around the 
flight_id to put your column headers as the row headers, and the second 
time to convert discreet column values into column headers. A single pivot 
can be rather quick under most circumstances but this double pivot would 
be a rather ungainly SQL statement and would not actually save you much 
effort (unless you automated its production in your application's code). 
It's a fairly easy pattern to write but by the time you wrote the query 
and executed it, you could have taken the raw data and transformed it just 
as easily using your option 2).

This is one of those situations where the data transformation is best left 
to application-layer code (using loops and arrays) than it would be to try 
to create a SQL statement to do it at the server. IMHO, Stick with 2).

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine


Reply via email to