Re: optimize for SELECTs on multiple large tables
On Wed, 2001-12-05 at 16:33, Arjen G. Lentz wrote: - Original Message - From: Florin Andrei [EMAIL PROTECTED] SELECT event.cid, iphdr.ip_src, iphdr.ip_dst, tcphdr.tcp_dport FROM event, iphdr, tcphdr WHERE event.cid = iphdr.cid AND event.cid = tcphdr.cid AND tcphdr.tcp_flags = '2'; So, MySQL actually has no index to work with. So, you will want to add an index on cid alone, for the event and iphdr tables (tcphdr table doesn't matter since your select will use the tcp_flags index to limit down the first part of the join, as described above). Making sense? Yes. Thank you. Now, suppose i don't know how my SELECTs will look like. Suppose i collect general purpose data, and i will devise my data mining schemes later. Therefore, i guess it may make sense to add indexes for almost all columns that might be used to search for data. But then, an index will slow down the INSERT speed, right? What's the performance penalty for INSERTs when i have indexes for almost all columns? (the tables have 5...10 columns, all of them integers of one sort or the other) It will slow down INSERTs 2 times? 10 times? More? -- Florin Andrei The security of any corporate network is inversely proportional to the number of systems administrators on the network. - - Petreley's law of sysadmins - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: optimize for SELECTs on multiple large tables
On Thu, Dec 06, 2001 at 09:39:18AM +1000, Arjen G. Lentz wrote: Well, you have to realise that the server can only use a one index per table (for obvious reasons). That's far from being obvious to a lot of folks... You can actually split a table into multiple identical ones, by using MERGE tables (see manual). But that just makes managing them easier (and inserts into one table faster), it wouldn't speed up your SELECT. In fact, doing so introduces a bit more overhead. So i wonder if there's any way to optimise MySQL so that i will be able to actually do a SELECT on those tables. Maybe tweak parameters like join_buffer_size? table_cache? Anyone has some experience with these?... What's the best way to optimize MySQL for running SELECTs on multiple large tables? Yes, server settings are important, and the 'right' settings depend on your system (amount of RAM, etc) as well as on your database and the type of queries that you run most often. And pick up a copy of the December Linux Magazine. There's an article on MySQL Server Tuning in there. It'll be on the web in a couple months, too. There is no magic wand, I'm afraid. But... you may find it advantagious to do a MySQL training course (www.mysql.com/training/). Optimisation is a very important part of the course material, in this message I've just touched briefly on a few little items. Good plan, too. :-) Jeremy -- Jeremy D. Zawodny, [EMAIL PROTECTED] Technical Yahoo - Yahoo Finance Desk: (408) 349-7878 Fax: (408) 349-5454 Cell: (408) 685-5936 MySQL 3.23.41-max: up 7 days, processed 157,486,781 queries (257/sec. avg) - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
optimize for SELECTs on multiple large tables
I have this SQL query: SELECT tbl1.col1, tbl2.col2 FROM tbl1, tbl2 WHERE \ tbl1.col3 = tbl2.col4 AND tbl1.col5 = '123'; (well, maybe there are more than two columns selected, and maybe a little more than two tables, but you got the idea...) The tables have 5...10 columns, every column is an integer type, and they have A LOT of rows (the total amount of space occupied by those tables on disk is 2 GB). The problem is, MySQL-3.23.46 takes forever to return from SELECT (i let it run over night, in the morning i still didn't got any results, so i killed the query). Splitting the big tables into small ones will not help, because i still want to be able to run the SELECT across _all_ data (and it's not convenient for my application). I cannot modify the structure of the tables (the application requires a certain table structure). So i wonder if there's any way to optimise MySQL so that i will be able to actually do a SELECT on those tables. Maybe tweak parameters like join_buffer_size? table_cache? Anyone has some experience with these?... What's the best way to optimize MySQL for running SELECTs on multiple large tables? This is the only thing that keeps me from deploying MySQL in production now. (Not to start a flamewar, just to acknowledge a fact: i ran the same SQL statement, with the same data set, on some other SQL servers, and i got the result in 5 minutes, as opposed to MySQL still giving back nothing after several hours. But i want to use MySQL because i need something that can do INSERTs very fast, and MySQL is very, very good at this. The only problem is, it is slow when it comes to SELECTs on multiple large tables, and i'm not sure how to optimize it for that.) -- Florin Andrei Engineering does not require science. - Linus Torvalds - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: optimize for SELECTs on multiple large tables
At 14:45 -0800 2001/12/05, Florin Andrei wrote: The problem is, MySQL-3.23.46 takes forever to return from SELECT (i let it run over night, in the morning i still didn't got any results, so i killed the query). Hi Florin, It would help if you could also provide: the hardware and OS the actual query (formatted to be readable, please : ) There's not really enough info in your posting to hazard a guess on a solution. Regards, /Rob ~ Robert Alexander, Alpha Geek, Workmate.ca WWW Database Applications and Web Hosting http://www.workmate.ca 416-823-6599 mailto:[EMAIL PROTECTED] Life's unfair - but root password helps! - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: optimize for SELECTs on multiple large tables
On Wed, 2001-12-05 at 15:01, Robert Alexander wrote: At 14:45 -0800 2001/12/05, Florin Andrei wrote: The problem is, MySQL-3.23.46 takes forever to return from SELECT (i let it run over night, in the morning i still didn't got any results, so i killed the query). the hardware and OS SGI 1200 (dual PIII/800, 768MB RAM, SCSI) Linux Red Hat 7.2 the actual query (formatted to be readable, please : ) SELECT event.cid, iphdr.ip_src, iphdr.ip_dst, tcphdr.tcp_dport FROM event, iphdr, tcphdr WHERE event.cid = iphdr.cid AND event.cid = tcphdr.cid AND tcphdr.tcp_flags = '2'; When i ran the query, it just sat there, doing nothing: no disk activity, CPU usage was 0%... As i said, after several hours i had to kill it. This is actually a database for Snort; i'm sniffing a network and recording relevant info about IP packets into the database. The event table contains only the event id (cid) and the timestamp for the IP packet. The iphdr table contains the source- and destination- IP for the packet (identified by cid). The tcphdr table contains TCP-specific info for the packet (identified by cid). I need to extract the source- and destination- IP address for certain packets (TCP with SYN flag set). And maybe ORDER BY iphdr.ip_src, but that's not mandatory. The actual table structure is this: CREATE TABLE event ( sid INT UNSIGNED NOT NULL, cid INT UNSIGNED NOT NULL, signature INT UNSIGNED NOT NULL, timestampDATETIME NOT NULL, PRIMARY KEY (sid,cid), INDEX sig (signature), INDEX time (timestamp)); CREATE TABLE iphdr ( sid INT UNSIGNED NOT NULL, cid INT UNSIGNED NOT NULL, ip_src INT UNSIGNED NOT NULL, ip_dst INT UNSIGNED NOT NULL, ip_ver TINYINT UNSIGNED, ip_hlen TINYINT UNSIGNED, ip_tos TINYINT UNSIGNED, ip_len SMALLINT UNSIGNED, ip_id SMALLINT UNSIGNED, ip_flagsTINYINT UNSIGNED, ip_off SMALLINT UNSIGNED, ip_ttl TINYINT UNSIGNED, ip_protoTINYINT UNSIGNED NOT NULL, ip_csum SMALLINT UNSIGNED, PRIMARY KEY (sid,cid), INDEX ip_src (ip_src), INDEX ip_dst (ip_dst)); CREATE TABLE tcphdr( sid INT UNSIGNED NOT NULL, cid INT UNSIGNED NOT NULL, tcp_sport SMALLINT UNSIGNED NOT NULL, tcp_dport SMALLINT UNSIGNED NOT NULL, tcp_seq INT UNSIGNED, tcp_ack INT UNSIGNED, tcp_off TINYINT UNSIGNED, tcp_res TINYINT UNSIGNED, tcp_flags TINYINT UNSIGNED NOT NULL, tcp_win SMALLINT UNSIGNED, tcp_csumSMALLINT UNSIGNED, tcp_urp SMALLINT UNSIGNED, PRIMARY KEY (sid,cid), INDEX tcp_sport (tcp_sport), INDEX tcp_dport (tcp_dport), INDEX tcp_flags (tcp_flags)); (it's the typical Snort database, and i don't want to change it, because many applications related to Snort expect it to be this way) -- Florin Andrei Engineering does not require science. - Linus Torvalds - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: optimize for SELECTs on multiple large tables
Florin Andrei writes: SELECT tbl1.col1, tbl2.col2 FROM tbl1, tbl2 WHERE \ tbl1.col3 = tbl2.col4 AND tbl1.col5 = '123'; The problem is, MySQL-3.23.46 takes forever to return from SELECT (i let it run over night, in the morning i still didn't got any results, so i killed the query). What're the table structures (most importantly: what indices exist)? What does the data look like? What did EXPLAIN return? Anything else we should know? (that's a catch-all question from a sleepy and intoxicated person, in case you're wondering) //C - scratching his beard -- Carl Troein - CĂrdan / Istari-PixelMagic - UIN 16353280 [EMAIL PROTECTED] | http://pixelmagic.dyndns.org/~cirdan/ Amiga user since '89, and damned proud of it too. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: optimize for SELECTs on multiple large tables
Hi Florian, - Original Message - From: Florin Andrei [EMAIL PROTECTED] I have this SQL query: SELECT tbl1.col1, tbl2.col2 FROM tbl1, tbl2 WHERE \ tbl1.col3 = tbl2.col4 AND tbl1.col5 = '123'; (well, maybe there are more than two columns selected, and maybe a little more than two tables, but you got the idea...) The tables have 5...10 columns, every column is an integer type, and they have A LOT of rows (the total amount of space occupied by those tables on disk is 2 GB). The problem is, MySQL-3.23.46 takes forever to return from SELECT (i let it run over night, in the morning i still didn't got any results, so i killed the query). Well, you have to realise that the server can only use a one index per table (for obvious reasons). So, the first question is, which fields are indexed, and which index does the server use for this query. To find out the latter, as well as find out in which order the server joins the tables, use EXPLAIN with your query. The way you'll want to set up a join (applies to any SQL server) is to limit the number of rows as fast as possible. For instance, if a table has a male/female column, putting an index on that will only cut the number of rows to be searched by half, so there'll still be heaps of rows that need to be checked. In a select join, that might not be very efficient. So, first think about these things, to set up your indexes and the query properly. It IS possible to tweak the server for a particular query, if you find that the optimiser isn't picking the best index. If you use STRAIGHT_JOIN instead of the commas, you can specify the join order. And you can also direct the server to use (or ignore) specific indexes (see manual for details). Those are just a few things. Splitting the big tables into small ones will not help, because i still want to be able to run the SELECT across _all_ data (and it's not convenient for my application). I cannot modify the structure of the tables (the application requires a certain table structure). You can actually split a table into multiple identical ones, by using MERGE tables (see manual). But that just makes managing them easier (and inserts into one table faster), it wouldn't speed up your SELECT. So i wonder if there's any way to optimise MySQL so that i will be able to actually do a SELECT on those tables. Maybe tweak parameters like join_buffer_size? table_cache? Anyone has some experience with these?... What's the best way to optimize MySQL for running SELECTs on multiple large tables? Yes, server settings are important, and the 'right' settings depend on your system (amount of RAM, etc) as well as on your database and the type of queries that you run most often. There is no magic wand, I'm afraid. But... you may find it advantagious to do a MySQL training course (www.mysql.com/training/). Optimisation is a very important part of the course material, in this message I've just touched briefly on a few little items. This is the only thing that keeps me from deploying MySQL in production now. (Not to start a flamewar, just to acknowledge a fact: i ran the same SQL statement, with the same data set, on some other SQL servers, and i got the result in 5 minutes, as opposed to MySQL still giving back nothing after several hours. But i want to use MySQL because i need something that can do INSERTs very fast, and MySQL is very, very good at this. The only problem is, it is slow when it comes to SELECTs on multiple large tables, and i'm not sure how to optimize it for that.) I'm confident that the MySQL server can do it just as fast, if not faster. Regards, Arjen. -- MySQL Training Worldwide, http://www.mysql.com/training/ __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Mr. Arjen G. Lentz [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Technical Writer, Trainer /_/ /_/\_, /___/\___\_\___/ Brisbane, QLD Australia ___/ www.mysql.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: optimize for SELECTs on multiple large tables
Hi again Florin, - Original Message - From: Florin Andrei [EMAIL PROTECTED] SELECT event.cid, iphdr.ip_src, iphdr.ip_dst, tcphdr.tcp_dport FROM event, iphdr, tcphdr WHERE event.cid = iphdr.cid AND event.cid = tcphdr.cid AND tcphdr.tcp_flags = '2'; Your only search condition is tcphdr.tcp_flags=2. So you will want to put the tcphdr first in the join and have this condition evaluated first (re-order your query and do EXPLAIN SELECT ), otherwise the server will still have to do a full table scan on the first two tables. The actual table structure is this: CREATE TABLE event ( sid INT UNSIGNED NOT NULL, cid INT UNSIGNED NOT NULL, [...] PRIMARY KEY (sid,cid), Each of the three tables has this. Now look at your SELECT and the join condition, purely on the field cid. So, MySQL actually has no index to work with. The (sid,cid) can't be used since cid is not the first field; you can use the first part of a key on its own, not the last part (since the last part will have no ordering independently the first part). So, you will want to add an index on cid alone, for the event and iphdr tables (tcphdr table doesn't matter since your select will use the tcp_flags index to limit down the first part of the join, as described above). Making sense? Please post the output from EXPLAIN if the above info does not have the desired effect. Then we can look at exactly what the server is thinking. And of course we also want to hear from you when it's ligning fast! (it's the typical Snort database, and i don't want to change it, because many applications related to Snort expect it to be this way) Adding an index shouldn't affect other apps, so that's easy. Regards, Arjen. -- MySQL Training Worldwide, http://www.mysql.com/training/ __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Mr. Arjen G. Lentz [EMAIL PROTECTED] / /|_/ / // /\ \/ /_/ / /__ MySQL AB, Technical Writer, Trainer /_/ /_/\_, /___/\___\_\___/ Brisbane, QLD Australia ___/ www.mysql.com - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail [EMAIL PROTECTED] To unsubscribe, e-mail [EMAIL PROTECTED] Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php