index length mate, first thing to look at, second would be my.cnf optimizations.

Zaid

On 6/18/07, Al-Faisal El-Dajani <[EMAIL PROTECTED]> wrote:
> This is a lengthy rant about a MySQL situation i was in lately. It is not a
> question, just a recount of some events that i went through and some of my
> conclusions. Feel free to ignore at will :).
>
> While working on a project the past two weeks, i had to deal with a rather
> large database to import and query. What follows is my experience in this
> project and some interesting trivia about how MySQL server works. My
> computer machine is a bit ancient, i have a P4 2.8HT CPU, 512 MB DDR RAM,
> and a SATA harddisk. My operating system is Ubuntu 7.04, and the only other
> process that was running was firefox (to pass the time reading bash :-p),
> Database server is MySQL 5.0.22, default installation (except i changed
> default charset to utf8). All queries were run using the MySQL terminal
> client.
>
> Here's the database structure:
> Table A:
>     id   int(11) primary key not null
>     num int(15) unique
> contains ~ 1 million rows
>
> Table B:
>     id int(11) primary key not null
>     A_id int(11) references Table A, id not null
>     attribute int(5)  (contains around 130 distinct values)
> contains ~ 13.5 million rows
>
> Now the idea is that based on user input i need to get Table A.num, filtered
> by whatever the user selects in Table B.attribute. My first reaction was to
> do a join between the tables:
>
> SELECT DISTINCT a.num FROM a, b WHERE a.id = b.a_id AND (b.attribute = x OR
> b.attribute = y OR ...) ORDER BY rand() LIMIT z;
>
> After two and a half hours of hogging the CPU at 100%, i killed the process
> and decided it wasn't such a good idea. The problem here is that a cartesean
> product was calculated first before starting to filter, which would generate
> a temporary table with 13,381,664 * 991,483 rows, followed by recursing over
> the entire temporary result set to check for my filter. After which, the
> result would be ordered according to a.num and duplicate values would be
> emitted according to the distinct rule, followed by re-ordering the result
> set randomly, and finally selecting LIMIT number of rows to display.
>
> So i decided to find another way, asking some friends i learned that
> rewriting it as a nested query would give me better results so i changed my
> query to the following:
>
> SELECT a.num FROM a WHERE a.id IN (SELECT DISTINCT a_id FROM b WHERE
> b.attribute = x or ... ORDER BY rand() LIMIT z);
>
> The idea here was to work on table B by itself, filtering the data, getting
> a subset of rows that is z rows long, randomly ordered, and is filtered
> according to my criteria, and use it to get the data i required.
> First I was surprised here to find out that my database server version does
> not support order by or rand clauses in the nested query, which would force
> me to move them to the outer query. This would make the inner query generate
> a much larger result set, which would be used to to filter the outer table
> then randomizing them and getting the number of rows i wanted. Needless to
> say this is slower than what i wanted but since it simply wasn't possible to
> do it the way i wanted to i rewrote it like so:
>
> SELECT a.num FROM a WHERE a.id IN (SELECT DISTINCT a_id FROM b WHERE
> b.attribute = x or ...) ORDER BY rand() LIMIT z;
>
> After about an hour and twenty minutes, the query finished and a result set
> was presented. I did a little victory dance, but i knew this huge delay
> won't be tolerated, so i had to think about how to hasten it even further.
>
> Now, i've heard a lot about indices, but never actually had to use them.
> Other than my primary key, or unique constraints, i never created an index.
> So, i decided to try this index thing, thinking that it wouldn't help
> much... but what the heck. I created an index on the attribute field in
> table B. The process took about 3 hours.
> Re-running both of my queries. I was extremely surprised to see the first
> query (join) was actually FASTER than my second query (nested). The only
> thing i managed to conclude was that MySQL doesn't support indices in inner
> statements (not sure, if you have a better explanation, please do share).
> The nested query took about the same time to finish (a few minutes faster,
> but since i only ran it once it might have been a fluke), while the join
> statement was reduced to a whooping 20 minute execution time.
>
> Getting high on the results, and gaining some new found respect to indices,
> i decided to create a second index, this time on the fields (attribute and
> a_id) in table B. Creating this index took a LOT of time, like 6-7 hours,
> but the results were worth it. Now my join query finishes in about 40-50
> seconds. creating the index on a_id actually makes a difference because in
> my query the first condition is " a.id = b.a_id", and it gets evaluated
> first, so when we get to the filtering part, we don't have a cartesean
> product, only the rows that are related.
>
> I'm at this stage now, still looking for ways to make even faster, however i
> gained a lot of respect to indices. I still have no idea how google manages
> to finish our search queries in a fraction of a second, i know they have WAY
> better hardware, but they also have WAY more information in their servers.
> One thing though, is that they have the ability to tune their DB server, a
> thing i can't don't have access to. Also, when you have a read only database
> (like my case), indices are a VERY good investment. Also, understanding how
> your DB server works, and how the query gets evaluated could give you a huge
> advantage.
>
> Special thanks goes to my project manager, Ghassan Noubani, for rewriting my
> join query to its present form, and suggesting the second index.
>
> --
> Al-Faisal El-Dajani
> Phone: +962-7-77 799 781
> P.O Box: 140056
> 11814 Amman, Jordan
>  >
>


-- 
---------------------------
Netiquette -> http://www.dtcc.edu/cs/rfc1855.html
Netiquette Nazi ->
http://redwing.hutman.net/%7Emreed/warriorshtm/netiquettenazi.htm
---------------------------

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Jolug" group.
 To post to this group, send email to [email protected]
 To unsubscribe from this group, send email to [EMAIL PROTECTED]
 For more options, visit this group at 
http://groups.google.com/group/Jolug?hl=en-GB
-~----------~----~----~----~------~----~------~--~---

رد على