memory issue: paging, disk activity, table size - was: optimizing a select statement

2004-03-01 Thread Karthik Viswanathan
Thanks for the information. Before I try to further look into the query, I 
would like to know if there is some memory issue. Its strange since the 
speed for executing same query differs. Its a Mac G5 with just 1GB ram. I 
could see lot of pageouts in the top command. The activity monitor shows 
that there is little free memory (12 -14M) and ~650M of inactive memory and 
~250M of active memory. Read on web that the less free memory is the 
inactive memory will be used. When I run a query (like the one we had 
discussed) there is only a slight difference in this memory status, where 
as the disk activity shows active 'data in', 'read in' . Does this means 
that its out of physical memory and uses swap file? I am not sure how mysql 
uses memory. Couple of tables I read in the query is more than 4GB but the 
query uses only two column of those tables and it will not be more than 1 G 
for sure. I am not sure if I understood correct or not and would like to 
hear your suggestion. Is there any relation between the table size (no of 
rows and file size)  and the system memory needed to get better 
performance? i read on web that its better to have ram more than the 
largest table size. is this true even if the query uses only few columns of 
big tables?

Thanks for your help
Karthik.
At 07:44 PM 2/29/2004, you wrote:
So if this is your query and based on the explain this is what I would
recommend.
SELECT Distinct (a.id)
From table1 a
INNER JOIN table1 b
USING ( p_id )
INNER JOIN table2 c
USING ( p_id )
INNER JOIN table3 d
USING ( out_id )
INNER JOIN table4 e ON ( d.name_id = e.name_id )
INNER JOIN table4 f  ON ( e.start_id
BETWEEN f.left_id AND f.end_id )
WHERE (f.name_id =45 OR f.name_id =56)
AND b.id =275 AND a.id != b.id
For some reason you are getting the using temporary on table b, I assume
based on the explain that you have an index called (id) that only has the id
in it.  Here's the first problem.  You are limiting b, by id.  And joining
on p_id.  So mysql is trying to use the index with the combination of both
of them which is combine I assume.  So removing distince really won't help
with this one much, since your indexes will always have a little problems.
INNER JOIN table4 f  ON ( e.start_id
BETWEEN f.left_id AND f.end_id )
This is probably the second biggest problem, you will always get a range.
And ranges will always be slower.  Don't really have a solution without
actually touching the data.
If you really have more than 50 million records this is really bad.
a.id != b.id
I would recommend trying to rewrite your query and just focus on table a and
f.  If you can get rid of them returning the extra 2 and 3 rows, I think
that would solve your problem.
Donny


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: optimizing a select statement over a database with 50 million recs

2004-02-29 Thread karthik viswanathan
here is the updated select statement i came with but still needs improvement

SELECT Distinct (a.id)
From table1 a 
INNER JOIN table1 b
USING ( p_id ) 
INNER JOIN table2 c
USING ( p_id ) 
INNER JOIN table3 d
USING ( out_id ) 
INNER JOIN table4 e ON ( d.name_id = e.name_id ) 
INNER JOIN table4 f  ON ( e.start_id
BETWEEN f.left_id AND f.end_id ) 
WHERE (f.name_id =45 OR f.name_id =56)
AND b.id =275 AND a.id != b.id 

This reduced the time from 0.5 secs to 0.3 secs but still i am looking for some major 
improvemnts. I am using this query in PHP and its been repeated several times to 
display a list. Any advice on this will be really useful. Is there anything I could 
do with the mysql and php configuration to increase the performance? The server has 
just 1GB ram, will adding more memory help? 

Thanks
Karthik.

 Need help on optimizing the select statement:
 
 Table structure
 
 table1
 --
 id 
 p_id 
 
 table2
 --
 p_id 
 out_id 
 
 table3
 --
 out_id
 name_id
 
 table4
 ---
 name_id (unique)
 prev_id
 start_id (unique)
 end_id (unique)
 
 Only table4 has unique fields all other fields are not unique.
 
 The following select statement does what is required but the tables are really
 huge 
 ( 50 million records) so need to be optimized
 
 SELECT DISTINCT (a.id)
 FROM table1 a, table1 b, table2 c, table3 d, table4 e, table4 f
 WHERE a.id =275 AND a.p_id = b.p_id AND a.id != b.id 
   AND a.p_id = c.p_id AND c.out_id = d.out_id AND d.name_id = e.name_id 
   AND e.start_id = f.start_id AND e.end_id = f.end_id AND e.end_id !=0 
   AND (f.name_id =45 OR f.name_id =56) 
 GROUP BY b.id
 
 The explain for the above statement is
 
 a | ref | id,p_id,combine | combine | 4 | const | 5281 | Using where; Using
 index; 
 Using temporary; Using f...  
 c | eq_ref | PRIMARY,p_id | PRIMARY | 4 | a.p_id | 1 |  |  | 
 d | ref | name_id,out_id | out_id | 4 | b.out_id | 1 |  |  | 
 b | ref | p_id | p_id | 4 | a.p_id | 3 | Using where | 
 e | range | PRIMARY,start_id,end_id,combine,name_id | PRIMARY | 4 | NULL | 2 |
 Using 
 where | 
 f | eq_ref | PRIMARY,start_id,end_id,combine,name_id | PRIMARY | 4 | c.name_id |

 1 | 
 Using where | 
 
 I am sure there should be some better way to do this using Inner join or
 something 
 similar but I am not sure how. It will be helpful if you could suggest me some 
 improvements for this query. If you need any further explanation please let me
 know.
 
 Thanks for your help
 Karu.
 
 
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
 






-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: optimizing a select statement over a database with 50 million recs

2004-02-29 Thread karthik viswanathan
Thanks for looking at my problem. here is the explain for this version of select

b | ref | id,p_id,combine | combine | 4 | const | 5281 | Using where; Using index;  
Using temporary | 
c | eq_ref | PRIMARY,p_id | PRIMARY | 4 | b.p_id | 1 |  |  | 
d | ref | name_id,out_id | out_id | 4 | c.out_id | 1 |  |  | 
a | ref | p_id | p_id | 4 | b.p_id | 3 | Using where | 
e | eq_ref | PRIMARY,start_id,combine,name_id | PRIMARY | 4 | d.name_id | 1 | 
Distinct | 
f | range | PRIMARY,name_id | PRIMARY | 4 | NULL | 2 | Using | where; | Distinct | 

Karthik


 Can you send us a new explain on this version?
 
 Donny
 
  -Original Message-
  From: karthik viswanathan [mailto:[EMAIL PROTECTED]
  Sent: Sunday, February 29, 2004 4:12 PM
  To: [EMAIL PROTECTED]
  Subject: Re: optimizing a select statement over a database with 50
  million recs
  
  here is the updated select statement i came with but still needs
  improvement
  
  SELECT Distinct (a.id)
  From table1 a
  INNER JOIN table1 b
  USING ( p_id )
  INNER JOIN table2 c
  USING ( p_id )
  INNER JOIN table3 d
  USING ( out_id )
  INNER JOIN table4 e ON ( d.name_id = e.name_id )
  INNER JOIN table4 f  ON ( e.start_id
  BETWEEN f.left_id AND f.end_id )
  WHERE (f.name_id =45 OR f.name_id =56)
  AND b.id =275 AND a.id != b.id
  
  This reduced the time from 0.5 secs to 0.3 secs but still i am looking for
  some major
  improvemnts. I am using this query in PHP and its been repeated several
  times to
  display a list. Any advice on this will be really useful. Is there
  anything I could
  do with the mysql and php configuration to increase the performance? The
  server has
  just 1GB ram, will adding more memory help?
  
  Thanks
  Karthik.
  
   Need help on optimizing the select statement:
  
   Table structure
  
   table1
   --
   id
   p_id
  
   table2
   --
   p_id
   out_id
  
   table3
   --
   out_id
   name_id
  
   table4
   ---
   name_id (unique)
   prev_id
   start_id (unique)
   end_id (unique)
  
   Only table4 has unique fields all other fields are not unique.
  
   The following select statement does what is required but the tables are
  really
   huge
   ( 50 million records) so need to be optimized
  
   SELECT DISTINCT (a.id)
   FROM table1 a, table1 b, table2 c, table3 d, table4 e, table4 f
   WHERE a.id =275 AND a.p_id = b.p_id AND a.id != b.id
 AND a.p_id = c.p_id AND c.out_id = d.out_id AND d.name_id = e.name_id
 AND e.start_id = f.start_id AND e.end_id = f.end_id AND e.end_id !=0
 AND (f.name_id =45 OR f.name_id =56)
   GROUP BY b.id
  
   The explain for the above statement is
  
   a | ref | id,p_id,combine | combine | 4 | const | 5281 | Using where;
  Using
   index;
   Using temporary; Using f...
   c | eq_ref | PRIMARY,p_id | PRIMARY | 4 | a.p_id | 1 |  |  |
   d | ref | name_id,out_id | out_id | 4 | b.out_id | 1 |  |  |
   b | ref | p_id | p_id | 4 | a.p_id | 3 | Using where |
   e | range | PRIMARY,start_id,end_id,combine,name_id | PRIMARY | 4 | NULL
  | 2 |
   Using
   where |
   f | eq_ref | PRIMARY,start_id,end_id,combine,name_id | PRIMARY | 4 |
  c.name_id |
  
   1 |
   Using where |
  
   I am sure there should be some better way to do this using Inner join or
   something
   similar but I am not sure how. It will be helpful if you could suggest
  me some
   improvements for this query. If you need any further explanation please
  let me
   know.
  
   Thanks for your help
   Karu.
  
  
  
   --
   MySQL General Mailing List
   For list archives: http://lists.mysql.com/mysql
   To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
  
  
  
  
  
  
  
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:
  http://lists.mysql.com/[EMAIL PROTECTED]
  
 
 
 
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
 






-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: optimizing a select statement over a database with 50 million recs

2004-02-29 Thread karthik viswanathan
Here is the command line explain, earlier i had used phpAdmin

+-++-+--+-+---
--+--+---+
| table   | type   | possible_keys   | key  | key_len | 
ref | rows | Extra |
+-++-+--+-+---
--+--+---+
| b   | ref| id,p_id,combine | combine  |   4 | 
const   | 5281 | Using where; Using index; Using temporary |
| c   | eq_ref | PRIMARY,p_id| PRIMARY  |   4 | 
b.p_id  |1 |   |
| d   | ref| name_id,out_id  | out_id   |   4 | 
c.out_id|1 |   |
| a   | ref| p_id| p_id |   4 | 
b.p_id  |3 | Using where   |
| e   | eq_ref | PRIMARY,start_id,combine,taxId  | PRIMARY  |   4 | 
d.name_id   |1 | Distinct  |
| f   | range  | PRIMARY,name_id | PRIMARY  |   4 | 
NULL|2 | Using where; Distinct |
+-++-+--+-+---
--+--+---+


 But looking at your explain with is hard to understand as I mentioned above,
 it look like your rows are 5281 * 1 * 1 * 4 * 2 or 42248 rows it has to go
 through.  

I think u r right

 Also see if you can change WHERE (f.name_id =45 OR f.name_id =56) to use IN.
 It has proven to be much faster for me.  But that won't speed it up .3
 seconds.

i tried that i dint see any difference.

If I take out the distinct explain doesnt show the Using temporary but I dint see 
any difference in the time also. Please let me know if you need any further 
information

Thanks
Karthik




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



optimizing a select statement over a database with 50 million recs

2004-02-28 Thread karthik viswanathan
Need help on optimizing the select statement:

Table structure

table1
--
id 
p_id 

table2
--
p_id 
out_id 

table3
--
out_id
name_id

table4
---
name_id (unique)
prev_id
start_id (unique)
end_id (unique)

Only table4 has unique fields all other fields are not unique.

The following select statement does what is required but the tables are really
huge 
( 50 million records) so need to be optimized

SELECT DISTINCT (a.id)
FROM table1 a, table1 b, table2 c, table3 d, table4 e, table4 f
WHERE a.id =275 AND a.p_id = b.p_id AND a.id != b.id 
  AND a.p_id = c.p_id AND c.out_id = d.out_id AND d.name_id = e.name_id 
  AND e.start_id = f.start_id AND e.end_id = f.end_id AND e.end_id !=0 
  AND (f.name_id =45 OR f.name_id =56) 
GROUP BY b.id

The explain for the above statement is

a | ref | id,p_id,combine | combine | 4 | const | 5281 | Using where; Using
index; 
Using temporary; Using f...  
c | eq_ref | PRIMARY,p_id | PRIMARY | 4 | a.p_id | 1 |  |  | 
d | ref | name_id,out_id | out_id | 4 | b.out_id | 1 |  |  | 
b | ref | p_id | p_id | 4 | a.p_id | 3 | Using where | 
e | range | PRIMARY,start_id,end_id,combine,name_id | PRIMARY | 4 | NULL | 2 |
Using 
where | 
f | eq_ref | PRIMARY,start_id,end_id,combine,name_id | PRIMARY | 4 | c.name_id |
1 | 
Using where | 

I am sure there should be some better way to do this using Inner join or
something 
similar but I am not sure how. It will be helpful if you could suggest me some 
improvements for this query. If you need any further explanation please let me
know.

Thanks for your help
Karu.



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]