Re: optimize for SELECTs on multiple large tables

2002-01-25 Thread Florin Andrei

On Wed, 2001-12-05 at 16:33, Arjen G. Lentz wrote:
 
 - Original Message -
 From: Florin Andrei [EMAIL PROTECTED]
 
  SELECT event.cid, iphdr.ip_src, iphdr.ip_dst, tcphdr.tcp_dport FROM
  event, iphdr, tcphdr WHERE event.cid = iphdr.cid AND event.cid =
  tcphdr.cid AND tcphdr.tcp_flags = '2';
 
 So, MySQL actually has no index to work with.
 So, you will want to add an index on cid alone, for the event and iphdr tables
 (tcphdr table doesn't matter since your select will use the tcp_flags index to
 limit down the first part of the join, as described above).
 Making sense?

Yes. Thank you.

Now, suppose i don't know how my SELECTs will look like. Suppose i
collect general purpose data, and i will devise my data mining schemes
later. Therefore, i guess it may make sense to add indexes for almost
all columns that might be used to search for data.

But then, an index will slow down the INSERT speed, right?

What's the performance penalty for INSERTs when i have indexes for
almost all columns? (the tables have 5...10 columns, all of them
integers of one sort or the other) It will slow down INSERTs 2 times? 10
times? More?

-- 
Florin Andrei

The security of any corporate network is inversely proportional
to the number of systems administrators on the network. -
- Petreley's law of sysadmins


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: optimize for SELECTs on multiple large tables

2001-12-06 Thread Jeremy Zawodny

On Thu, Dec 06, 2001 at 09:39:18AM +1000, Arjen G. Lentz wrote:

 Well, you have to realise that the server can only use a one index
 per table (for obvious reasons).

That's far from being obvious to a lot of folks...

 You can actually split a table into multiple identical ones, by
 using MERGE tables (see manual).  But that just makes managing them
 easier (and inserts into one table faster), it wouldn't speed up
 your SELECT.

In fact, doing so introduces a bit more overhead.

  So i wonder if there's any way to optimise MySQL so that i will be
  able to actually do a SELECT on those tables.  Maybe tweak
  parameters like join_buffer_size? table_cache? Anyone has some
  experience with these?... What's the best way to optimize MySQL
  for running SELECTs on multiple large tables?
 
 Yes, server settings are important, and the 'right' settings depend
 on your system (amount of RAM, etc) as well as on your database and
 the type of queries that you run most often.

And pick up a copy of the December Linux Magazine.  There's an article
on MySQL Server Tuning in there.  It'll be on the web in a couple
months, too.

 There is no magic wand, I'm afraid. But... you may find it
 advantagious to do a MySQL training course
 (www.mysql.com/training/). Optimisation is a very important part of
 the course material, in this message I've just touched briefly on a
 few little items.

Good plan, too. :-)

Jeremy
-- 
Jeremy D. Zawodny, [EMAIL PROTECTED]
Technical Yahoo - Yahoo Finance
Desk: (408) 349-7878   Fax: (408) 349-5454   Cell: (408) 685-5936

MySQL 3.23.41-max: up 7 days, processed 157,486,781 queries (257/sec. avg)

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




optimize for SELECTs on multiple large tables

2001-12-05 Thread Florin Andrei

I have this SQL query:

SELECT tbl1.col1, tbl2.col2 FROM tbl1, tbl2 WHERE \
  tbl1.col3 = tbl2.col4 AND tbl1.col5 = '123';

(well, maybe there are more than two columns selected, and maybe a
little more than two tables, but you got the idea...)
The tables have 5...10 columns, every column is an integer type, and
they have A LOT of rows (the total amount of space occupied by those
tables on disk is 2 GB).

The problem is, MySQL-3.23.46 takes forever to return from SELECT (i let
it run over night, in the morning i still didn't got any results, so i
killed the query).

Splitting the big tables into small ones will not help, because i still
want to be able to run the SELECT across _all_ data (and it's not
convenient for my application).
I cannot modify the structure of the tables (the application requires a
certain table structure).

So i wonder if there's any way to optimise MySQL so that i will be able
to actually do a SELECT on those tables.
Maybe tweak parameters like join_buffer_size? table_cache? Anyone has
some experience with these?... What's the best way to optimize MySQL for
running SELECTs on multiple large tables?

This is the only thing that keeps me from deploying MySQL in production
now.

(Not to start a flamewar, just to acknowledge a fact: i ran the same SQL
statement, with the same data set, on some other SQL servers, and i
got the result in 5 minutes, as opposed to MySQL still giving back
nothing after several hours. But i want to use MySQL because i need
something that can do INSERTs very fast, and MySQL is very, very good at
this. The only problem is, it is slow when it comes to SELECTs on
multiple large tables, and i'm not sure how to optimize it for that.)

-- 
Florin Andrei

Engineering does not require science. - Linus Torvalds


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: optimize for SELECTs on multiple large tables

2001-12-05 Thread Robert Alexander

At 14:45 -0800 2001/12/05, Florin Andrei wrote:
The problem is, MySQL-3.23.46 takes forever to return from SELECT (i let
it run over night, in the morning i still didn't got any results, so i
killed the query).

Hi Florin,

It would help if you could also provide:

  the hardware and OS
  the actual query (formatted to be readable, please : )

There's not really enough info in your posting to hazard a guess on a solution.

Regards,
/Rob


~
Robert Alexander, Alpha Geek, Workmate.ca
WWW Database Applications and Web Hosting
http://www.workmate.ca   416-823-6599
mailto:[EMAIL PROTECTED]

Life's unfair - but root password helps!

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: optimize for SELECTs on multiple large tables

2001-12-05 Thread Florin Andrei

On Wed, 2001-12-05 at 15:01, Robert Alexander wrote:
 At 14:45 -0800 2001/12/05, Florin Andrei wrote:
 The problem is, MySQL-3.23.46 takes forever to return from SELECT (i let
 it run over night, in the morning i still didn't got any results, so i
 killed the query).
 
   the hardware and OS

SGI 1200 (dual PIII/800, 768MB RAM, SCSI)
Linux Red Hat 7.2

   the actual query (formatted to be readable, please : )

SELECT event.cid, iphdr.ip_src, iphdr.ip_dst, tcphdr.tcp_dport FROM
event, iphdr, tcphdr WHERE event.cid = iphdr.cid AND event.cid =
tcphdr.cid AND tcphdr.tcp_flags = '2';

When i ran the query, it just sat there, doing nothing: no disk
activity, CPU usage was 0%... As i said, after several hours i had to
kill it.

This is actually a database for Snort; i'm sniffing a network and
recording relevant info about IP packets into the database.

The event table contains only the event id (cid) and the timestamp for
the IP packet.
The iphdr table contains the source- and destination- IP for the packet
(identified by cid).
The tcphdr table contains TCP-specific info for the packet (identified
by cid).

I need to extract the source- and destination- IP address for certain
packets (TCP with SYN flag set). And maybe ORDER BY iphdr.ip_src, but
that's not mandatory.

The actual table structure is this:

CREATE TABLE event  ( sid INT  UNSIGNED NOT NULL,
  cid INT  UNSIGNED NOT NULL,
  signature   INT  UNSIGNED NOT NULL, 
  timestampDATETIME NOT NULL,
  PRIMARY KEY (sid,cid),
  INDEX   sig (signature),
  INDEX   time (timestamp));

CREATE TABLE iphdr  ( sid INT  UNSIGNED NOT NULL,
  cid INT  UNSIGNED NOT NULL,
  ip_src  INT  UNSIGNED NOT NULL,
  ip_dst  INT  UNSIGNED NOT NULL,
  ip_ver  TINYINT  UNSIGNED,
  ip_hlen TINYINT  UNSIGNED,
  ip_tos  TINYINT  UNSIGNED,
  ip_len  SMALLINT UNSIGNED,
  ip_id   SMALLINT UNSIGNED,
  ip_flagsTINYINT  UNSIGNED,
  ip_off  SMALLINT UNSIGNED,
  ip_ttl  TINYINT  UNSIGNED,
  ip_protoTINYINT  UNSIGNED NOT NULL,
  ip_csum SMALLINT UNSIGNED,
  PRIMARY KEY (sid,cid),
  INDEX ip_src (ip_src),
  INDEX ip_dst (ip_dst));

CREATE TABLE tcphdr(  sid INT  UNSIGNED NOT NULL,
  cid INT  UNSIGNED NOT NULL,
  tcp_sport   SMALLINT UNSIGNED NOT NULL,
  tcp_dport   SMALLINT UNSIGNED NOT NULL,
  tcp_seq INT  UNSIGNED,
  tcp_ack INT  UNSIGNED,
  tcp_off TINYINT  UNSIGNED,
  tcp_res TINYINT  UNSIGNED,
  tcp_flags   TINYINT  UNSIGNED NOT NULL,
  tcp_win SMALLINT UNSIGNED,
  tcp_csumSMALLINT UNSIGNED,
  tcp_urp SMALLINT UNSIGNED,
  PRIMARY KEY (sid,cid),
  INDEX   tcp_sport (tcp_sport),
  INDEX   tcp_dport (tcp_dport),
  INDEX   tcp_flags (tcp_flags));

(it's the typical Snort database, and i don't want to change it, because
many applications related to Snort expect it to be this way)

-- 
Florin Andrei

Engineering does not require science. - Linus Torvalds


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: optimize for SELECTs on multiple large tables

2001-12-05 Thread Carl Troein


Florin Andrei writes:

 SELECT tbl1.col1, tbl2.col2 FROM tbl1, tbl2 WHERE \
   tbl1.col3 = tbl2.col4 AND tbl1.col5 = '123';
 
 The problem is, MySQL-3.23.46 takes forever to return from SELECT (i let
 it run over night, in the morning i still didn't got any results, so i
 killed the query).

What're the table structures (most importantly: what indices exist)?
What does the data look like? What did EXPLAIN return? Anything else
we should know? (that's a catch-all question from a sleepy and
intoxicated person, in case you're wondering)

//C - scratching his beard

-- 
 Carl Troein - CĂ­rdan / Istari-PixelMagic - UIN 16353280
 [EMAIL PROTECTED] | http://pixelmagic.dyndns.org/~cirdan/
 Amiga user since '89, and damned proud of it too.


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: optimize for SELECTs on multiple large tables

2001-12-05 Thread Arjen G. Lentz

Hi Florian,

- Original Message -
From: Florin Andrei [EMAIL PROTECTED]


 I have this SQL query:
 SELECT tbl1.col1, tbl2.col2 FROM tbl1, tbl2 WHERE \
   tbl1.col3 = tbl2.col4 AND tbl1.col5 = '123';
 (well, maybe there are more than two columns selected, and maybe a
 little more than two tables, but you got the idea...)
 The tables have 5...10 columns, every column is an integer type, and
 they have A LOT of rows (the total amount of space occupied by those
 tables on disk is 2 GB).
 The problem is, MySQL-3.23.46 takes forever to return from SELECT (i let
 it run over night, in the morning i still didn't got any results, so i
 killed the query).

Well, you have to realise that the server can only use a one index per table
(for obvious reasons).
So, the first question is, which fields are indexed, and which index does the
server use for this query.
To find out the latter, as well as find out in which order the server joins
the tables, use EXPLAIN  with your query.

The way you'll want to set up a join (applies to any SQL server) is to limit
the number of rows as fast as possible.
For instance, if a table has a male/female column, putting an index on that
will only cut the number of rows to be searched by half, so there'll still be
heaps of rows that need to be checked. In a select join, that might not be
very efficient.
So, first think about these things, to set up your indexes and the query
properly.

It IS possible to tweak the server for a particular query, if you find that
the optimiser isn't picking the best index.
If you use STRAIGHT_JOIN instead of the commas, you can specify the join
order.
And you can also direct the server to use (or ignore) specific indexes (see
manual for details).

Those are just a few things.


 Splitting the big tables into small ones will not help, because i still
 want to be able to run the SELECT across _all_ data (and it's not
 convenient for my application).
 I cannot modify the structure of the tables (the application requires a
 certain table structure).

You can actually split a table into multiple identical ones, by using MERGE
tables (see manual).
But that just makes managing them easier (and inserts into one table faster),
it wouldn't speed  up your SELECT.


 So i wonder if there's any way to optimise MySQL so that i will be able
 to actually do a SELECT on those tables.
 Maybe tweak parameters like join_buffer_size? table_cache? Anyone has
 some experience with these?... What's the best way to optimize MySQL for
 running SELECTs on multiple large tables?

Yes, server settings are important, and the 'right' settings depend on your
system (amount of RAM, etc) as well as on your database and the type of
queries that you run most often.

There is no magic wand, I'm afraid. But... you may find it advantagious to do
a MySQL training course (www.mysql.com/training/). Optimisation is a very
important part of the course material, in this message I've just touched
briefly on a few little items.


 This is the only thing that keeps me from deploying MySQL in production
 now.

 (Not to start a flamewar, just to acknowledge a fact: i ran the same SQL
 statement, with the same data set, on some other SQL servers, and i
 got the result in 5 minutes, as opposed to MySQL still giving back
 nothing after several hours. But i want to use MySQL because i need
 something that can do INSERTs very fast, and MySQL is very, very good at
 this. The only problem is, it is slow when it comes to SELECTs on
 multiple large tables, and i'm not sure how to optimize it for that.)

I'm confident that the MySQL server can do it just as fast, if not faster.


Regards,
Arjen.

--
MySQL Training Worldwide, http://www.mysql.com/training/
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /Mr. Arjen G. Lentz [EMAIL PROTECTED]
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Technical Writer, Trainer
/_/  /_/\_, /___/\___\_\___/   Brisbane, QLD Australia
   ___/   www.mysql.com




-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: optimize for SELECTs on multiple large tables

2001-12-05 Thread Arjen G. Lentz

Hi again Florin,

- Original Message -
From: Florin Andrei [EMAIL PROTECTED]


 SELECT event.cid, iphdr.ip_src, iphdr.ip_dst, tcphdr.tcp_dport FROM
 event, iphdr, tcphdr WHERE event.cid = iphdr.cid AND event.cid =
 tcphdr.cid AND tcphdr.tcp_flags = '2';

Your only search condition is tcphdr.tcp_flags=2. So you will want to put the
tcphdr first in the join and have this condition evaluated first (re-order
your query and do EXPLAIN SELECT ), otherwise the server will still have
to do a full table scan on the first two tables.

 The actual table structure is this:
 CREATE TABLE event  ( sid INT  UNSIGNED NOT NULL,
   cid INT  UNSIGNED NOT NULL,
 [...]
   PRIMARY KEY (sid,cid),

Each of the three tables has this.
Now look at your SELECT and the join condition, purely on the field cid.
So, MySQL actually has no index to work with.
The (sid,cid) can't be used since cid is not the first field; you can use the
first part of a key on its own, not the last part (since the last part will
have no ordering independently the first part).
So, you will want to add an index on cid alone, for the event and iphdr tables
(tcphdr table doesn't matter since your select will use the tcp_flags index to
limit down the first part of the join, as described above).

Making sense?
Please post the output from EXPLAIN if the above info does not have the
desired effect. Then we can look at exactly what the server is thinking.
And of course we also want to hear from you when it's ligning fast!


 (it's the typical Snort database, and i don't want to change it, because
 many applications related to Snort expect it to be this way)

Adding an index shouldn't affect other apps, so that's easy.


Regards,
Arjen.

--
MySQL Training Worldwide, http://www.mysql.com/training/
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /Mr. Arjen G. Lentz [EMAIL PROTECTED]
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Technical Writer, Trainer
/_/  /_/\_, /___/\___\_\___/   Brisbane, QLD Australia
   ___/   www.mysql.com




-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php