Re: YAQQ (Yet Another Query Question)

2005-12-14 Thread nigel wood

Mark Phillips wrote:


Flights
+---+--+--+
| flight_id | data1_id | data2_id |
+---+--+--+
| 1 |1 |1 |
| 2 |1 |3 |
| 3 |1 |1 |
| 4 |2 |2 |
| 5 |2 |3 |
| 6 |1 |1 |
| 7 |1 |1 |
| 8 |4 |4 |
| 9 |1 |2 |
|10 |1 |2 |
|11 |1 |1 |
+---+--+--+

The data1_id and data2_id are indexes for the data recorded for that flight.

I want to summarize the data. One such summary is to count the number of 
different data1_id's and data2_id's. For example:


Flight Result Summary
index:  1   2   3   4
data1_id8   2   0   1
data2_id5   3   2   1

 


select
 sum(if(data1_id =1,1, 0)) as data1_id_1,  sum(if(data1_id =2, 1, 
0)) as data1_id_2, etc , etc
 sum(if(data2_id =1,1, 0)) as data2_id_1,  sum(if(data2_id =2, 1, 
0)) as data2_id_2 etc, etc

from flights

add composite indexes if required for speed.

Nigel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: YAQQ (Yet Another Query Question)

2005-12-14 Thread nigel wood

Mark Phillips wrote:



2. Generally, what is the most efficient way to do this? Is is better to 
issue more queries that gather the calculated data or better to issue one 
query for the raw data and then do the calculations in Java? I am sure there 
are many factors that effect the answer to this question - server resources, 
code design, etc. However, I am interested in a best practices type of answer 
or general rule of thumb from the sage experts on the list. 

 


Sorry only just spotted the second half.

Processing in MySQL will be faster than pulling the dataset back and 
processing it. This is particularly true if the database server is 
remote from the servlet container. The chief reason is that processing 
it on the client add the time needed to copy the raw data over the 
network. In Java or C.*  data processing performance can be on a par 
with MySQL once the data is obtained, against an interpreted language 
such as PHP or Perl the database's performance  will always win hands 
down even if temporary tables are needed.


If the rocket data doesn't change rapidly the MySQL query cache will 
also improve preformance. This feature speeds things by remembering the 
answer to your query and replying with a  cached version until the 
rockets table is next updated.


Nigel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: YAQQ (Yet Another Query Question)

2005-12-14 Thread SGreen
Mark Phillips [EMAIL PROTECTED] wrote on 12/14/2005 11:31:03 
AM:

 I am using MySQL 4.0.x on a Linux machine with a JSP/Servlet front-end 
to 
 display the data.
 
 I have a table with experimental data for each flight of a rocket. 
 Conceptually, it looks like (with many more columns):
 
 Flights
 +---+--+--+
 | flight_id | data1_id | data2_id |
 +---+--+--+
 | 1 |1 |1 |
 | 2 |1 |3 |
 | 3 |1 |1 |
 | 4 |2 |2 |
 | 5 |2 |3 |
 | 6 |1 |1 |
 | 7 |1 |1 |
 | 8 |4 |4 |
 | 9 |1 |2 |
 |10 |1 |2 |
 |11 |1 |1 |
 +---+--+--+
 
 The data1_id and data2_id are indexes for the data recorded for that 
flight.
 
 I want to summarize the data. One such summary is to count the number of 

 different data1_id's and data2_id's. For example:
 
 Flight Result Summary
index:   1   2   3   4
 data1_id  8   2   0   1
 data2_id  5   3   2   1
 
 I can think of 2 ways to make this summary table.
 
 1. Issue 4 queries per data_id of the form 
 SELECT COUNT(flight_id) FROM Flights WHERE data1_id=**
 where ** is set to the values 1,2,3,4. For the table above, I would have 
to 
 issue a total of 8 queries.
 
 2. Issue one query of the form
 SELECT flight_id FROM Flights
 and do the counting in my Java code. A simple loop through the 
 ResultSet could 
 count the different values for the data_ids.
 
 My questions are:
 
 1. Is there a better way than these two options for getting the dataI 
want? A 
 single query per data_id? 
 
 2. Generally, what is the most efficient way to do this? Is is better 
to 
 issue more queries that gather the calculated data or better to issue 
one 
 query for the raw data and then do the calculations in Java? I am sure 
there 
 are many factors that effect the answer to this question - server 
resources, 
 code design, etc. However, I am interested in a best practices type of 
answer 
 or general rule of thumb from the sage experts on the list. 
 
 Thanks for any insights you can provide!
 
 -- 
 Mark Phillips
 Phillips Marketing, Inc
 [EMAIL PROTECTED]
 602 524-0376
 480 945-9197 fax
 

Your option 1) may experience network lag for each query/result cycle, 
depending on how you connect. If you have a decent index, each query will 
be very quick so that's not necessarily going to be much of an issue. If 
you have a fast connection that becomes less of an issue, too. 

Your option 2) could turn out to be very quick, it all depends on how 
efficiently you can code your pivot routine on the client side.

I thought this was going to be a simple pivot table until I looked 
again. You are actually pivoting your data twice: Once around the 
flight_id to put your column headers as the row headers, and the second 
time to convert discreet column values into column headers. A single pivot 
can be rather quick under most circumstances but this double pivot would 
be a rather ungainly SQL statement and would not actually save you much 
effort (unless you automated its production in your application's code). 
It's a fairly easy pattern to write but by the time you wrote the query 
and executed it, you could have taken the raw data and transformed it just 
as easily using your option 2).

This is one of those situations where the data transformation is best left 
to application-layer code (using loops and arrays) than it would be to try 
to create a SQL statement to do it at the server. IMHO, Stick with 2).

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine




RE: YAQQ (Yet Another Query Question)

2005-12-14 Thread Peter Lauri
Have you tried the GROUP BY?

Make something like (not sure of exact syntax, check the manual for that):

SELECT COUNT(*) AS cnt, data1_id FROM data1_id GROUP BY data1_iD;

/Peter


-Original Message-
From: Mark Phillips [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 14, 2005 11:31 PM
To: MYSQL List
Subject: YAQQ (Yet Another Query Question)

I am using MySQL 4.0.x on a Linux machine with a JSP/Servlet front-end to 
display the data.

I have a table with experimental data for each flight of a rocket. 
Conceptually, it looks like (with many more columns):

Flights
+---+--+--+
| flight_id | data1_id | data2_id |
+---+--+--+
| 1 |1 |1 |
| 2 |1 |3 |
| 3 |1 |1 |
| 4 |2 |2 |
| 5 |2 |3 |
| 6 |1 |1 |
| 7 |1 |1 |
| 8 |4 |4 |
| 9 |1 |2 |
|10 |1 |2 |
|11 |1 |1 |
+---+--+--+

The data1_id and data2_id are indexes for the data recorded for that flight.

I want to summarize the data. One such summary is to count the number of 
different data1_id's and data2_id's. For example:

Flight Result Summary
index:  1   2   3   4
data1_id8   2   0   1
data2_id5   3   2   1

I can think of 2 ways to make this summary table.

1. Issue 4 queries per data_id of the form 
SELECT COUNT(flight_id) FROM Flights WHERE data1_id=**
where ** is set to the values 1,2,3,4. For the table above, I would have to 
issue a total of 8 queries.

2. Issue one query of the form
SELECT flight_id FROM Flights
and do the counting in my Java code. A simple loop through the ResultSet
could 
count the different values for the data_ids.

My questions are:

1. Is there a better way than these two options for getting the data I want?
A 
single query per data_id? 

2. Generally, what is the most efficient way to do this? Is is better to 
issue more queries that gather the calculated data or better to issue one 
query for the raw data and then do the calculations in Java? I am sure there

are many factors that effect the answer to this question - server resources,

code design, etc. However, I am interested in a best practices type of
answer 
or general rule of thumb from the sage experts on the list. 

Thanks for any insights you can provide!

-- 
Mark Phillips
Phillips Marketing, Inc
[EMAIL PROTECTED]
602 524-0376
480 945-9197 fax

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: YAQQ (Yet Another Query Question)

2005-12-14 Thread Mark Phillips
Nigel,

Thanks!

Mark
On Wednesday 14 December 2005 09:42 am, nigel wood wrote:
 Mark Phillips wrote:
 Flights
 +---+--+--+
 
 | flight_id | data1_id | data2_id |
 
 +---+--+--+
 
 | 1 |1 |1 |
 | 2 |1 |3 |
 | 3 |1 |1 |
 | 4 |2 |2 |
 | 5 |2 |3 |
 | 6 |1 |1 |
 | 7 |1 |1 |
 | 8 |4 |4 |
 | 9 |1 |2 |
 |10 |1 |2 |
 |11 |1 |1 |
 
 +---+--+--+
 
 The data1_id and data2_id are indexes for the data recorded for that
  flight.
 
 I want to summarize the data. One such summary is to count the number of
 different data1_id's and data2_id's. For example:
 
 Flight Result Summary
  index:  1   2   3   4
 data1_id 8   2   0   1
 data2_id 5   3   2   1

 select
   sum(if(data1_id =1,1, 0)) as data1_id_1,  sum(if(data1_id =2, 1,
 0)) as data1_id_2, etc , etc
   sum(if(data2_id =1,1, 0)) as data2_id_1,  sum(if(data2_id =2, 1,
 0)) as data2_id_2 etc, etc
 from flights

 add composite indexes if required for speed.

 Nigel

-- 
Mark Phillips
[EMAIL PROTECTED]
602 524-0376

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: YAQQ (Yet Another Query Question)

2005-12-14 Thread Mark Phillips
Nigel,

Again, thanks - that is the rule of thumb I was looking for!

Mark

On Wednesday 14 December 2005 09:57 am, nigel wood wrote:
 Mark Phillips wrote:
 2. Generally, what is the most efficient way to do this? Is is better to
 issue more queries that gather the calculated data or better to issue
  one query for the raw data and then do the calculations in Java? I am
  sure there are many factors that effect the answer to this question -
  server resources, code design, etc. However, I am interested in a best
  practices type of answer or general rule of thumb from the sage experts
  on the list.

 Sorry only just spotted the second half.

 Processing in MySQL will be faster than pulling the dataset back and
 processing it. This is particularly true if the database server is
 remote from the servlet container. The chief reason is that processing
 it on the client add the time needed to copy the raw data over the
 network. In Java or C.*  data processing performance can be on a par
 with MySQL once the data is obtained, against an interpreted language
 such as PHP or Perl the database's performance  will always win hands
 down even if temporary tables are needed.

 If the rocket data doesn't change rapidly the MySQL query cache will
 also improve preformance. This feature speeds things by remembering the
 answer to your query and replying with a  cached version until the
 rockets table is next updated.

 Nigel

-- 
Mark Phillips
[EMAIL PROTECTED]
602 524-0376

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: YAQQ (Yet Another Query Question)

2005-12-14 Thread Mark Phillips
Thanks to everyone for their help. Using Nigel's suggestion, I was able to 
gather all the summary data in one query. Those nested if()'s are really 
useful!

FWIW, you can see the summary stats at http://rockets.phillipsoasis.com
Just click on Hopi Rockets and scroll to the bottom of the page. My small 
contribution to science education!

This list is great!

Mark

On Wednesday 14 December 2005 09:42 am, nigel wood wrote:
 Mark Phillips wrote:
 Flights
 +---+--+--+
 
 | flight_id | data1_id | data2_id |
 
 +---+--+--+
 
 | 1 |1 |1 |
 | 2 |1 |3 |
 | 3 |1 |1 |
 | 4 |2 |2 |
 | 5 |2 |3 |
 | 6 |1 |1 |
 | 7 |1 |1 |
 | 8 |4 |4 |
 | 9 |1 |2 |
 |10 |1 |2 |
 |11 |1 |1 |
 
 +---+--+--+
 
 The data1_id and data2_id are indexes for the data recorded for that
  flight.
 
 I want to summarize the data. One such summary is to count the number of
 different data1_id's and data2_id's. For example:
 
 Flight Result Summary
  index:  1   2   3   4
 data1_id 8   2   0   1
 data2_id 5   3   2   1

 select
   sum(if(data1_id =1,1, 0)) as data1_id_1,  sum(if(data1_id =2, 1,
 0)) as data1_id_2, etc , etc
   sum(if(data2_id =1,1, 0)) as data2_id_1,  sum(if(data2_id =2, 1,
 0)) as data2_id_2 etc, etc
 from flights

 add composite indexes if required for speed.

 Nigel

-- 
Mark Phillips
[EMAIL PROTECTED]
602 524-0376

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]