subject:"indexes"

 | SIMPLE  | send_sms_test | index | NULL  | priority_time
| 12  | NULL |  * 11* | Using where |
++-+---+---+---+---+-+--+--+-+
1 row in set (0.00 sec)

But If I issue the query I see in the mysql-slow.log:
select * from send_sms_test where time=UNIX_TIMESTAMP(NOW()) order by
priority limit 0,11;

If I do create INDEX time,priority (in reverse order instead of
priority,time) I get still the same usage of priority_time key with the
same length, but rows now are doubled):
mysql *create index time_priority ON send_sms_test (time,priority);*
Query OK, 0 rows affected (0.67 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql *desc select * from send_sms_test where time=UNIX_TIMESTAMP(NOW())
order by priority limit 0,11;*
++-+---+---+---+---+-+--+--+-+
| id | select_type | table | type  | possible_keys | key
| key_len | ref  | rows | Extra   |
++-+---+---+---+---+-+--+--+-+
|  1 | SIMPLE  | send_sms_test | index | time_priority | priority_time
| 12  | NULL |   *22* | Using where |
++-+---+---+---+---+-+--+--+-+

And if both indexes created I do not have anymore this query in the
slow-log.

Of course If I disable log_queries_not_using_indexes I get none of the
queries.

So is it a bug inside Percona's implementation or it's generally MySQL
behavior?

Thanks

Re: mysql logs query with indexes used to the slow-log and not logging if there is index in reverse order

` bigint(20) DEFAULT NULL,
   `rpi` bigint(20) DEFAULT NULL,
   `charset` varchar(255) DEFAULT NULL,
   `boxc_id` varchar(255) DEFAULT NULL,
   `binfo` varchar(255) DEFAULT NULL,
   `meta_data` text,
   `task_id` bigint(20) DEFAULT NULL,
   `msgid` bigint(20) DEFAULT NULL,
   `priority` int(3) unsigned NOT NULL DEFAULT '500',
   PRIMARY KEY (`sql_id`),
   KEY `task_id` (`task_id`),
   KEY `receiver` (`receiver`),
   KEY `msgid` (`msgid`),
   KEY `priority_time` (`priority`,`time`)
 ) ENGINE=InnoDB AUTO_INCREMENT=7806318 DEFAULT CHARSET=utf8

 Slow-queries turned on with an option:
 | log_queries_not_using_indexes | ON|

 mysqld --version
 mysqld  Ver 5.1.65-rel14.0 for debian-linux-gnu on x86_64 ((Percona Server
 (GPL), 14.0, Revision 475))

 If I check with EXPLAIN MySQL says it would use the index:
 mysql *desc select * from send_sms_test where
 time=UNIX_TIMESTAMP(NOW()) order by priority limit 0,11;*

 ++-+---+---+---+---+-+--+--+-+
 | id | select_type | table | type  | possible_keys | key
 | key_len | ref  | rows | Extra   |

 ++-+---+---+---+---+-+--+--+-+
 |  1 | SIMPLE  | send_sms_test | index | NULL  | priority_time
 | 12  | NULL |  * 11* | Using where |

 ++-+---+---+---+---+-+--+--+-+
 1 row in set (0.00 sec)

 But If I issue the query I see in the mysql-slow.log:
 select * from send_sms_test where time=UNIX_TIMESTAMP(NOW()) order by
 priority limit 0,11;

 If I do create INDEX time,priority (in reverse order instead of
 priority,time) I get still the same usage of priority_time key with the
 same length, but rows now are doubled):
 mysql *create index time_priority ON send_sms_test (time,priority);*
 Query OK, 0 rows affected (0.67 sec)
 Records: 0  Duplicates: 0  Warnings: 0

 mysql *desc select * from send_sms_test where
 time=UNIX_TIMESTAMP(NOW()) order by priority limit 0,11;*

 ++-+---+---+---+---+-+--+--+-+
 | id | select_type | table | type  | possible_keys | key
 | key_len | ref  | rows | Extra   |

 ++-+---+---+---+---+-+--+--+-+
 |  1 | SIMPLE  | send_sms_test | index | time_priority | priority_time
 | 12  | NULL |   *22* | Using where |

 ++-+---+---+---+---+-+--+--+-+

 And if both indexes created I do not have anymore this query in the
 slow-log.

 Of course If I disable log_queries_not_using_indexes I get none of the
 queries.

 So is it a bug inside Percona's implementation or it's generally MySQL
 behavior?

 Thanks

RE: mysql logs query with indexes used to the slow-log and not logging if there is index in reverse order

2012-10-15 Thread Rick James

* Rows = 11 / 22 -- don't take the numbers too seriously; they are crude 
approximations based on estimated cardinality.

* The 11 comes from the LIMIT -- therefore useless in judging the efficiency.  
(The 22 may be 2*11; I don't know.)

* Run the EXPLAINs without LIMIT -- that will avoid the bogus 11/22.

* If the CREATE INDEX took only 0.67 sec, I surmise that you have very few rows 
in the table??  So this discussion is not necessarily valid in general cases.

* What percentage of time values meet the WHERE?  This has a big impact on the 
choice of explain plan and performance.

* Set long_query_time = 0; to get it in the slowlog even if it is fast.  Then 
look at the various extra values (such as filesort, on disk, temp table used, 
etc).

* Do this (with each index):
SHOW SESSION STATUS LIKE 'Handler_read%';
SELECT ... FORCE INDEX(...) ...;
SHOW SESSION STATUS LIKE 'Handler_read%';
Then take the diffs of the handler counts.  This will give you a pretty 
detailed idea of what is going on; better than the SlowLog.

* INT(3) is not a 3-digit integer, it is a full 32-bit integer (4 bytes).  
Perhaps you should have SMALLINT UNSIGNED (2 bytes).

* BIGINT takes 8 bytes -- usually over-sized.


 -Original Message-
 From: spameden [mailto:spame...@gmail.com]
 Sent: Monday, October 15, 2012 1:42 PM
 To: mysql@lists.mysql.com
 Subject: mysql logs query with indexes used to the slow-log and not
 logging if there is index in reverse order
 
 Hi, list.
 
 Sorry for the long subject, but I'm really interested in solving this
 and need a help:
 
 I've got a table:
 
 mysql show create table send_sms_test;
 +---+--
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---+
 | Table | Create
 Table
 |
 +---+--
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---
 ---+
 | send_sms_test | CREATE TABLE `send_sms_test` (
   `sql_id` bigint(20) NOT NULL AUTO_INCREMENT,
   `momt` enum('MO','MT') DEFAULT NULL,
   `sender` varchar(20) DEFAULT NULL,
   `receiver` varchar(20) DEFAULT NULL,
   `udhdata` blob,
   `msgdata` text,
   `time` bigint(20) NOT NULL,
   `smsc_id` varchar(255) DEFAULT 'main',
   `service` varchar(255) DEFAULT NULL,
   `account` varchar(255) DEFAULT NULL,
   `id` bigint(20) DEFAULT NULL,
   `sms_type` tinyint(1) DEFAULT '2',
   `mclass

Re: mysql logs query with indexes used to the slow-log and not logging if there is index in reverse order

|
| Handler_read_next | 576090 |
| Handler_read_prev | 0  |
| Handler_read_rnd  | 126|
| Handler_read_rnd_next | 223|
+---++
6 rows in set (0.00 sec)

mysql select * from send_sms_test FORCE INDEX (time_priority) where
time=UNIX_TIMESTAMP(NOW()) order by priority LIMIT 0,100;
100 rows in set (0.09 sec)

mysql SHOW SESSION STATUS LIKE 'Handler_read%';
+---++
| Variable_name | Value  |
+---++
| Handler_read_first| 18 |
| Handler_read_key  | 244|
| Handler_read_next | 719969 |
| Handler_read_prev | 0  |
| Handler_read_rnd  | 226|
| Handler_read_rnd_next | 223|
+---++
6 rows in set (0.00 sec)

I don't understand much in Handler thing, could you please explain more,
based on the results I've posted ? In which case it works better and how it
uses the index?

About BIGINT(20) and INT(3) I will look further into this later, I
understand it might be oversized, but my main question is about index why
it's using it so weird.

Many thanks for your quick answer!
2012/10/16 Rick James rja...@yahoo-inc.com

 * Rows = 11 / 22 -- don't take the numbers too seriously; they are crude
 approximations based on estimated cardinality.

 * The 11 comes from the LIMIT -- therefore useless in judging the
 efficiency.  (The 22 may be 2*11; I don't know.)

 * Run the EXPLAINs without LIMIT -- that will avoid the bogus 11/22.

 * If the CREATE INDEX took only 0.67 sec, I surmise that you have very few
 rows in the table??  So this discussion is not necessarily valid in general
 cases.

 * What percentage of time values meet the WHERE?  This has a big impact on
 the choice of explain plan and performance.

 * Set long_query_time = 0; to get it in the slowlog even if it is fast.
  Then look at the various extra values (such as filesort, on disk, temp
 table used, etc).

 * Do this (with each index):
 SHOW SESSION STATUS LIKE 'Handler_read%';
 SELECT ... FORCE INDEX(...) ...;
 SHOW SESSION STATUS LIKE 'Handler_read%';
 Then take the diffs of the handler counts.  This will give you a pretty
 detailed idea of what is going on; better than the SlowLog.

 * INT(3) is not a 3-digit integer, it is a full 32-bit integer (4 bytes).
  Perhaps you should have SMALLINT UNSIGNED (2 bytes).

 * BIGINT takes 8 bytes -- usually over-sized.


  -Original Message-
  From: spameden [mailto:spame...@gmail.com]
  Sent: Monday, October 15, 2012 1:42 PM
  To: mysql@lists.mysql.com
  Subject: mysql logs query with indexes used to the slow-log and not
  logging if there is index in reverse order
 
  Hi, list.
 
  Sorry for the long subject, but I'm really interested in solving this
  and need a help:
 
  I've got a table:
 
  mysql show create table send_sms_test;
  +---+--
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---
  ---+
  | Table | Create
  Table

Re: mysql logs query with indexes used to the slow-log and not logging if there is index in reverse order

Sorry, forgot to say:

mysql show variables like 'long_query_time%';
+-+---+
| Variable_name   | Value |
+-+---+
| long_query_time | 10.00 |
+-+---+
1 row in set (0.00 sec)

It's getting in the log only due:

mysql show variables like '%indexes%';
+---+---+
| Variable_name | Value |
+---+---+
| log_queries_not_using_indexes | ON|
+---+---+
1 row in set (0.00 sec)

If I turn it off - it's all fine

My initial question was why MySQL logs it in the slow log if the query uses
an INDEX?

And why it's not logging if I create an INDEX (time, priority) (but in the
query there is FORCE INDEX (priority,time) specified, so MySQL shouldn't
use newly created INDEX (time, priority) at all).

2012/10/16 spameden spame...@gmail.com

 Sorry, my previous e-mail was a test on MySQL-5.5.28 on an empty table.

 Here is the MySQL-5.1 Percona testing table:

 mysql select count(*) from send_sms_test;
 +--+
 | count(*) |
 +--+
 |   143879 |
 +--+
 1 row in set (0.03 sec)

 Without LIMIT:
 mysql desc select * from send_sms_test FORCE INDEX (time_priority) where
 time=UNIX_TIMESTAMP(NOW()) order by priority;

 ++-+---+---+---+---+-+--+---+-+

 | id | select_type | table | type  | possible_keys | key
 | key_len | ref  | rows  | Extra   |

 ++-+---+---+---+---+-+--+---+-+
 |  1 | SIMPLE  | send_sms_test | range | time_priority | time_priority
 | 8   | NULL | 73920 | Using where; Using filesort |

 ++-+---+---+---+---+-+--+---+-+
 1 row in set (0.00 sec)

 mysql desc select * from send_sms_test FORCE INDEX (priority_time) where
 time=UNIX_TIMESTAMP(NOW()) order by priority;

 ++-+---+---+---+---+-+--++-+

 | id | select_type | table | type  | possible_keys | key
 | key_len | ref  | rows   | Extra   |

 ++-+---+---+---+---+-+--++-+
 |  1 | SIMPLE  | send_sms_test | index | NULL  | priority_time
 | 12  | NULL | 147840 | Using where |

 ++-+---+---+---+---+-+--++-+

 1 row in set (0.00 sec)

 But I actually need to use LIMIT, because client uses this to limit the
 number of records returned to process.

 mysql select * from send_sms_test FORCE INDEX (priority_time) where
 time=UNIX_TIMESTAMP(NOW()) order by priority LIMIT 0,100;
 100 rows in set (0.00 sec)

 mysql show profile;
 ++--+
 | Status | Duration |
 ++--+
 | starting   | 0.53 |
 | Opening tables | 0.09 |
 | System lock| 0.05 |
 | Table lock | 0.04 |
 | init   | 0.37 |
 | optimizing | 0.05 |
 | statistics | 0.07 |
 | preparing  | 0.05 |
 | executing  | 0.01 |
 | Sorting result | 0.03 |
 | Sending data   | 0.000856 |
 | end| 0.03 |
 | query end  | 0.01 |
 | freeing items  | 0.15 |
 | logging slow query | 0.01 |
 | logging slow query | 0.47 |
 | cleaning up| 0.02 |
 ++--+
 17 rows in set (0.00 sec)

 mysql select * from send_sms_test FORCE INDEX (time_priority) where
 time=UNIX_TIMESTAMP(NOW()) order by priority LIMIT 0,100;
 100 rows in set (0.08 sec)
 mysql show profile;
 ++--+
 | Status | Duration |
 ++--+
 | starting   | 0.48 |
 | Opening tables | 0.09 |
 | System lock| 0.02 |
 | Table lock | 0.04 |
 | init   | 0.47 |
 | optimizing | 0.06 |
 | statistics | 0.43 |
 | preparing  | 0.18 |
 | executing  | 0.01 |
 | Sorting result | 0.076725 |
 | Sending data   | 0.001406 |
 | end| 0.03 |
 | query end  | 0.01 |
 | freeing items  | 0.12 |
 | logging slow query | 0.01 |
 | cleaning up| 0.02 |
 ++--+
 16 rows in set (0.00 sec)

 As you can see latter query takes more time, because it's using filesort
 as well.

 Now, handler:
 mysql SHOW SESSION STATUS LIKE 'Handler_read%';select * from
 send_sms_test FORCE INDEX (priority_time) where time=UNIX_TIMESTAMP(NOW())
 order by priority LIMIT 0,100;SHOW SESSION STATUS LIKE

RE: mysql logs query with indexes used to the slow-log and not logging if there is index in reverse order

2012-10-15 Thread Rick James

I don't fully understand Handler numbers, either.  But note the vast difference 
in Handler_read_next, as if the second test had to read (sequentially scan) a 
lot more stuff (in the index or the data).

Summary:
   INDEX(time, priority) -- slower; bigger Handler numbers; shorter key_len; 
filesort
   INDEX(priority, time) -- faster; smaller; seems to use both keys of the 
index (key_len=12); avoids filesort (because INDEX(priority, ...) agrees with 
ORDER BY priority).

The Optimizer has (at some level) two choices:

* Start with the WHERE

* Start with the ORDER BY
Since the ORDER BY matches one of the indexes, it can avoid the sort and stop 
with the LIMIT.  However, if most of the rows failed the WHERE clause, this 
could be the wrong choice.
That is, it is hard for the optimizer to get a query like this right every 
time.

To see what I mean, flip the inequality in WHERE time = ... around;  I think 
the results will be disappointing.

If you had more than a million rows, I would bring up PARTITIONing as a assist 
to this 2-dimensional type of problem.

From: spameden [mailto:spame...@gmail.com]
Sent: Monday, October 15, 2012 3:23 PM
To: Rick James
Cc: mysql@lists.mysql.com
Subject: Re: mysql logs query with indexes used to the slow-log and not logging 
if there is index in reverse order

Sorry, my previous e-mail was a test on MySQL-5.5.28 on an empty table.

Here is the MySQL-5.1 Percona testing table:

mysql select count(*) from send_sms_test;
+--+
| count(*) |
+--+
|   143879 |
+--+
1 row in set (0.03 sec)

Without LIMIT:
mysql desc select * from send_sms_test FORCE INDEX (time_priority) where 
time=UNIX_TIMESTAMP(NOW()) order by priority;
++-+---+---+---+---+-+--+---+-+
| id | select_type | table | type  | possible_keys | key   | 
key_len | ref  | rows  | Extra   |
++-+---+---+---+---+-+--+---+-+
|  1 | SIMPLE  | send_sms_test | range | time_priority | time_priority | 8  
 | NULL | 73920 | Using where; Using filesort |
++-+---+---+---+---+-+--+---+-+
1 row in set (0.00 sec)

mysql desc select * from send_sms_test FORCE INDEX (priority_time) where 
time=UNIX_TIMESTAMP(NOW()) order by priority;
++-+---+---+---+---+-+--++-+
| id | select_type | table | type  | possible_keys | key   | 
key_len | ref  | rows   | Extra   |
++-+---+---+---+---+-+--++-+
|  1 | SIMPLE  | send_sms_test | index | NULL  | priority_time | 12 
 | NULL | 147840 | Using where |
++-+---+---+---+---+-+--++-+
1 row in set (0.00 sec)

But I actually need to use LIMIT, because client uses this to limit the number 
of records returned to process.

mysql select * from send_sms_test FORCE INDEX (priority_time) where 
time=UNIX_TIMESTAMP(NOW()) order by priority LIMIT 0,100;
100 rows in set (0.00 sec)

mysql show profile;
++--+
| Status | Duration |
++--+
| starting   | 0.53 |
| Opening tables | 0.09 |
| System lock| 0.05 |
| Table lock | 0.04 |
| init   | 0.37 |
| optimizing | 0.05 |
| statistics | 0.07 |
| preparing  | 0.05 |
| executing  | 0.01 |
| Sorting result | 0.03 |
| Sending data   | 0.000856 |
| end| 0.03 |
| query end  | 0.01 |
| freeing items  | 0.15 |
| logging slow query | 0.01 |
| logging slow query | 0.47 |
| cleaning up| 0.02 |
++--+
17 rows in set (0.00 sec)

mysql select * from send_sms_test FORCE INDEX (time_priority) where 
time=UNIX_TIMESTAMP(NOW()) order by priority LIMIT 0,100;
100 rows in set (0.08 sec)
mysql show profile;
++--+
| Status | Duration |
++--+
| starting   | 0.48 |
| Opening tables | 0.09 |
| System lock| 0.02 |
| Table lock | 0.04 |
| init   | 0.47 |
| optimizing | 0.06 |
| statistics | 0.43 |
| preparing  | 0.18 |
| executing  | 0.01 |
| Sorting result | 0.076725 |
| Sending data   | 0.001406 |
| end| 0.03 |
| query end  | 0.01 |
| freeing items  | 0.12 |
| logging slow query | 0.01 |
| cleaning up| 0.02

RE: mysql logs query with indexes used to the slow-log and not logging if there is index in reverse order

2012-10-15 Thread Rick James

Ø  My initial question was why MySQL logs it in the slow log if the query uses 
an INDEX?

That _may_ be worth a bug report.

A _possible_ answer...  EXPLAIN presents what the optimizer is in the mood for 
at that moment.  It does not necessarily reflect what it was in the mood for 
when it ran the query.

When timing things, run them twice (and be sure not to hit the Query cache).  
The first time freshens the cache (buffer_pool, etc); the second time gives you 
a 'reproducible' time.  I believe (without proof) that the cache contents can 
affect the optimizer's choice.

From: spameden [mailto:spame...@gmail.com]
Sent: Monday, October 15, 2012 3:29 PM
To: Rick James
Cc: mysql@lists.mysql.com
Subject: Re: mysql logs query with indexes used to the slow-log and not logging 
if there is index in reverse order

Sorry, forgot to say:

mysql show variables like 'long_query_time%';
+-+---+
| Variable_name   | Value |
+-+---+
| long_query_time | 10.00 |
+-+---+
1 row in set (0.00 sec)

It's getting in the log only due:

mysql show variables like '%indexes%';
+---+---+
| Variable_name | Value |
+---+---+
| log_queries_not_using_indexes | ON|
+---+---+
1 row in set (0.00 sec)

If I turn it off - it's all fine

My initial question was why MySQL logs it in the slow log if the query uses an 
INDEX?

And why it's not logging if I create an INDEX (time, priority) (but in the 
query there is FORCE INDEX (priority,time) specified, so MySQL shouldn't use 
newly created INDEX (time, priority) at all).
2012/10/16 spameden spame...@gmail.commailto:spame...@gmail.com
Sorry, my previous e-mail was a test on MySQL-5.5.28 on an empty table.

Here is the MySQL-5.1 Percona testing table:

mysql select count(*) from send_sms_test;
+--+
| count(*) |
+--+
|   143879 |
+--+
1 row in set (0.03 sec)

Without LIMIT:
mysql desc select * from send_sms_test FORCE INDEX (time_priority) where 
time=UNIX_TIMESTAMP(NOW()) order by priority;
++-+---+---+---+---+-+--+---+-+

| id | select_type | table | type  | possible_keys | key   | 
key_len | ref  | rows  | Extra   |
++-+---+---+---+---+-+--+---+-+
|  1 | SIMPLE  | send_sms_test | range | time_priority | time_priority | 8  
 | NULL | 73920 | Using where; Using filesort |
++-+---+---+---+---+-+--+---+-+
1 row in set (0.00 sec)
mysql desc select * from send_sms_test FORCE INDEX (priority_time) where 
time=UNIX_TIMESTAMP(NOW()) order by priority;
++-+---+---+---+---+-+--++-+

| id | select_type | table | type  | possible_keys | key   | 
key_len | ref  | rows   | Extra   |
++-+---+---+---+---+-+--++-+
|  1 | SIMPLE  | send_sms_test | index | NULL  | priority_time | 12 
 | NULL | 147840 | Using where |
++-+---+---+---+---+-+--++-+

1 row in set (0.00 sec)
But I actually need to use LIMIT, because client uses this to limit the number 
of records returned to process.

mysql select * from send_sms_test FORCE INDEX (priority_time) where 
time=UNIX_TIMESTAMP(NOW()) order by priority LIMIT 0,100;
100 rows in set (0.00 sec)

mysql show profile;
++--+
| Status | Duration |
++--+
| starting   | 0.53 |
| Opening tables | 0.09 |
| System lock| 0.05 |
| Table lock | 0.04 |
| init   | 0.37 |
| optimizing | 0.05 |
| statistics | 0.07 |
| preparing  | 0.05 |
| executing  | 0.01 |
| Sorting result | 0.03 |
| Sending data   | 0.000856 |
| end| 0.03 |
| query end  | 0.01 |
| freeing items  | 0.15 |
| logging slow query | 0.01 |
| logging slow query | 0.47 |
| cleaning up| 0.02 |
++--+
17 rows in set (0.00 sec)

mysql select * from send_sms_test FORCE INDEX (time_priority) where 
time=UNIX_TIMESTAMP(NOW()) order by priority LIMIT 0,100;
100 rows in set (0.08 sec)
mysql show profile;
++--+
| Status | Duration |
++--+
| starting   | 0.48 |
| Opening tables | 0.09 |
| System lock| 0.02 |
| Table lock

Re: mysql logs query with indexes used to the slow-log and not logging if there is index in reverse order

Thanks a lot for all your comments!

I did disable Query cache before testing with

set query_cache_type=OFF

for the current session.

I will report this to the MySQL bugs site later.



2012/10/16 Rick James rja...@yahoo-inc.com

 **Ø  **My initial question was why MySQL logs it in the slow log if the
 query uses an INDEX?

 

 That _may_ be worth a bug report.

 ** **

 A _possible_ answer...  EXPLAIN presents what the optimizer is in the mood
 for at that moment.  It does not necessarily reflect what it was in the
 mood for when it ran the query.

 ** **

 When timing things, run them twice (and be sure not to hit the Query
 cache).  The first time freshens the cache (buffer_pool, etc); the second
 time gives you a 'reproducible' time.  I believe (without proof) that the
 cache contents can affect the optimizer's choice.

 ** **

 *From:* spameden [mailto:spame...@gmail.com]
 *Sent:* Monday, October 15, 2012 3:29 PM

 *To:* Rick James
 *Cc:* mysql@lists.mysql.com
 *Subject:* Re: mysql logs query with indexes used to the slow-log and not
 logging if there is index in reverse order

 ** **

 Sorry, forgot to say:

 mysql show variables like 'long_query_time%';
 +-+---+
 | Variable_name   | Value |
 +-+---+
 | long_query_time | 10.00 |
 +-+---+
 1 row in set (0.00 sec)

 It's getting in the log only due:

 mysql show variables like '%indexes%';
 +---+---+
 | Variable_name | Value |
 +---+---+
 | log_queries_not_using_indexes | ON|
 +---+---+
 1 row in set (0.00 sec)

 If I turn it off - it's all fine

 My initial question was why MySQL logs it in the slow log if the query
 uses an INDEX?

 And why it's not logging if I create an INDEX (time, priority) (but in the
 query there is FORCE INDEX (priority,time) specified, so MySQL shouldn't
 use newly created INDEX (time, priority) at all).

 2012/10/16 spameden spame...@gmail.com

 Sorry, my previous e-mail was a test on MySQL-5.5.28 on an empty table.

 Here is the MySQL-5.1 Percona testing table:

 mysql select count(*) from send_sms_test;
 +--+
 | count(*) |
 +--+
 |   143879 |
 +--+
 1 row in set (0.03 sec)

 Without LIMIT:
 mysql desc select * from send_sms_test FORCE INDEX (time_priority) where
 time=UNIX_TIMESTAMP(NOW()) order by priority;

 ++-+---+---+---+---+-+--+---+-+
 


 | id | select_type | table | type  | possible_keys | key
 | key_len | ref  | rows  | Extra   |


 ++-+---+---+---+---+-+--+---+-+
 |  1 | SIMPLE  | send_sms_test | range | time_priority | time_priority
 | 8   | NULL | 73920 | Using where; Using filesort |

 ++-+---+---+---+---+-+--+---+-+
 

 1 row in set (0.00 sec)

 mysql desc select * from send_sms_test FORCE INDEX (priority_time) where
 time=UNIX_TIMESTAMP(NOW()) order by priority;

 ++-+---+---+---+---+-+--++-+
 


 | id | select_type | table | type  | possible_keys | key
 | key_len | ref  | rows   | Extra   |


 ++-+---+---+---+---+-+--++-+
 |  1 | SIMPLE  | send_sms_test | index | NULL  | priority_time
 | 12  | NULL | 147840 | Using where |

 ++-+---+---+---+---+-+--++-+
 


 1 row in set (0.00 sec)

 But I actually need to use LIMIT, because client uses this to limit the
 number of records returned to process.

 mysql select * from send_sms_test FORCE INDEX (priority_time) where
 time=UNIX_TIMESTAMP(NOW()) order by priority LIMIT 0,100;
 100 rows in set (0.00 sec)

 mysql show profile;
 ++--+
 | Status | Duration |
 ++--+
 | starting   | 0.53 |
 | Opening tables | 0.09 |
 | System lock| 0.05 |
 | Table lock | 0.04 |
 | init   | 0.37 |
 | optimizing | 0.05 |
 | statistics | 0.07 |
 | preparing  | 0.05 |
 | executing  | 0.01 |
 | Sorting result | 0.03 |
 | Sending data   | 0.000856 |
 | end| 0.03 |
 | query end  | 0.01 |
 | freeing items  | 0.15 |
 | logging slow query | 0.01 |
 | logging slow query | 0.47 |
 | cleaning up| 0.02 |
 ++--+
 17 rows in set (0.00

RE: Are Single Column Indexes are sufficient

2012-09-18 Thread Rick James

WHERE  (t0.job_id = '006-120613043532587-o-C')
  AND  t0.bean_type = 'ActionItems';
Begs for 
  INDEX(job_id, bean_type) -- in either order

where  job_id = '0043189-120805203721153-o-C'
  and  nominal_time = '2012-09-07 07:16:00'
  and  nominal_time  '2012-09-07 08:06:00'
group by  status;
Begs for
  INDEX(job_id, normal_time) -- in THIS order
It cannot make effective use of `status` in an index.

WHERE  (t0.pending  0
  AND  (t0.status = 'SUSPENDED'
   OR  t0.status = 'KILLED'
   OR  t0.status = 'RUNNING')
  AND  t0.last_modified_time = '2012-09-07 08:08:34')
  AND  t0.bean_type = 'ActionItems';
Change the `status` check to
AND t0.status IN ('SUSPENDED', 'KILLED', 'RUNNING')
Other compound indexes might work, but I guess this is the best:
  INDEX(bean_type, status, last_modified_time)  -- in THIS order

Note that I put the '=' field(s) first in the INDEX.
Then I put _one_ range field next.  (IN is sort of in between = and 
range.)

Single-field indexes _might_ use the index merge feature -- but rarely.  And 
almost always an appropriate compound index will out-perform it.

Usually I ignore cardinality when picking an index.

A single-field index on a 'flag' or low cardinality field is almost never 
chosen by the optimizer.  Don't bother having such indexes.

When asking performance questions, please provide
SHOW CREATE TABLE (not DESCRIBE)
SHOW TABLE STATUS (for size info)
EXPLAIN SELECT ...

To figure out how many fields of the index are being used, look at the len 
field of EXPLAIN, then look at the sizes of the fields in the chosen index.  
(Add 1 for NULLable fields.)

 -Original Message-
 From: Adarsh Sharma [mailto:eddy.ada...@gmail.com]
 Sent: Monday, September 17, 2012 10:06 PM
 To: mysql@lists.mysql.com
 Subject: Are Single Column Indexes are sufficient
 
 Hi all,
 
 Currently i am doing performance level tuning of some queries that are
 running very slow in my slow -query log. Below are the sample of some
 queries  the cardinality of indexes :-
 --- Below queries take more than 15 minutes to complete on a table
 scd_table of size 7 GB SELECT t0.id, t0.bean_type, t0.action_number,
 t0.action_xml, t0.console_url, t0.created_conf, t0.error_code,
 t0.error_message, t0.external_status, t0.missing_dependencies,
 t0.run_conf, t0.time_out, t0.tracker_uri, t0.job_type, t0.created_time,
 t0.external_id, t0.job_id, t0.last_modified_time, t0.nominal_time,
 t0.pending, t0.rerun_time, t0.sla_xml, t0.status FROM scd_table t0
 WHERE (t0.job_id =
 '006-120613043532587-o-C') AND t0.bean_type = 'ActionItems';
 
 select status, count(*) as cnt from scd_table where job_id = '0043189-
 120805203721153-o-C' and nominal_time = '2012-09-07 07:16:00' and
 nominal_time  '2012-09-07 08:06:00' group by status;
 
 SELECT t0.id, t0.bean_type, t0.action_number, t0.action_xml,
 t0.console_url, t0.created_conf, t0.error_code, t0.error_message,
 t0.external_status, t0.missing_dependencies, t0.run_conf, t0.time_out,
 t0.tracker_uri, t0.job_type, t0.created_time, t0.external_id,
 t0.job_id, t0.last_modified_time, t0.nominal_time, t0.pending,
 t0.rerun_time, t0.sla_xml, t0.status FROM scd_table t0 WHERE
 (t0.pending  0 AND (t0.status = 'SUSPENDED' OR t0.status = 'KILLED' OR
 t0.status = 'RUNNING') AND t0.last_modified_time = '2012-09-07
 08:08:34') AND t0.bean_type = 'ActionItems';
 
 mysql show indexes from scd_table;
 +---++--+--
 ++---+-+--+
 +--++-+
 | Table | Non_unique | Key_name |
 Seq_in_index
 | Column_name| Collation | Cardinality | Sub_part | Packed |
 Null |
 Index_type | Comment |
 +---++--+--
 ++---+-+--+
 +--++-+
 | scd_table |  0 | PRIMARY  |1
 | id
 | A |  188908 | NULL | NULL   |  |
 BTREE  | |
 | scd_table |  1 | I_CRD_TNS_CREATED_TIME   |1
 |
 created_time   | A |  188908 | NULL | NULL   | YES
 |
 BTREE  | |
 | scd_table |  1 | I_CRD_TNS_DTYPE  |1
 |
 bean_type  | A |  14 | NULL | NULL   | YES
 |
 BTREE  | |
 | scd_table |  1 | I_CRD_TNS_EXTERNAL_ID|1
 |
 external_id| A |  188908 | NULL | NULL   | YES
 |
 BTREE  | |
 | scd_table |  1 | I_CRD_TNS_JOB_ID |1
 |
 job_id | A | 365 | NULL | NULL   | YES
 |
 BTREE  | |
 | scd_table |  1 | I_CRD_TNS_LAST_MODIFIED_TIME |1
 |
 last_modified_time | A |  188908 | NULL | NULL   | YES
 |
 BTREE

Are Single Column Indexes are sufficient

2012-09-17 Thread Adarsh Sharma

Hi all,

Currently i am doing performance level tuning of some queries that are
running very slow in my slow -query log. Below are the sample of some
queries  the cardinality of indexes :-
--- Below queries take more than 15 minutes to complete on a table
scd_table of size 7 GB
SELECT t0.id, t0.bean_type, t0.action_number, t0.action_xml,
t0.console_url, t0.created_conf, t0.error_code, t0.error_message,
t0.external_status, t0.missing_dependencies, t0.run_conf, t0.time_out,
t0.tracker_uri, t0.job_type, t0.created_time, t0.external_id, t0.job_id,
t0.last_modified_time, t0.nominal_time, t0.pending, t0.rerun_time,
t0.sla_xml, t0.status FROM scd_table t0 WHERE (t0.job_id =
'006-120613043532587-o-C') AND t0.bean_type = 'ActionItems';

select status, count(*) as cnt from scd_table where job_id =
'0043189-120805203721153-o-C' and nominal_time = '2012-09-07 07:16:00' and
nominal_time  '2012-09-07 08:06:00' group by status;

SELECT t0.id, t0.bean_type, t0.action_number, t0.action_xml,
t0.console_url, t0.created_conf, t0.error_code, t0.error_message,
t0.external_status, t0.missing_dependencies, t0.run_conf, t0.time_out,
t0.tracker_uri, t0.job_type, t0.created_time, t0.external_id, t0.job_id,
t0.last_modified_time, t0.nominal_time, t0.pending, t0.rerun_time,
t0.sla_xml, t0.status FROM scd_table t0 WHERE (t0.pending  0 AND
(t0.status = 'SUSPENDED' OR t0.status = 'KILLED' OR t0.status = 'RUNNING')
AND t0.last_modified_time = '2012-09-07 08:08:34') AND t0.bean_type =
'ActionItems';

mysql show indexes from scd_table;
+---++--+--++---+-+--++--++-+
| Table | Non_unique | Key_name | Seq_in_index
| Column_name| Collation | Cardinality | Sub_part | Packed | Null |
Index_type | Comment |
+---++--+--++---+-+--++--++-+
| scd_table |  0 | PRIMARY  |1 | id
| A |  188908 | NULL | NULL   |  |
BTREE  | |
| scd_table |  1 | I_CRD_TNS_CREATED_TIME   |1 |
created_time   | A |  188908 | NULL | NULL   | YES  |
BTREE  | |
| scd_table |  1 | I_CRD_TNS_DTYPE  |1 |
bean_type  | A |  14 | NULL | NULL   | YES  |
BTREE  | |
| scd_table |  1 | I_CRD_TNS_EXTERNAL_ID|1 |
external_id| A |  188908 | NULL | NULL   | YES  |
BTREE  | |
| scd_table |  1 | I_CRD_TNS_JOB_ID |1 |
job_id | A | 365 | NULL | NULL   | YES  |
BTREE  | |
| scd_table |  1 | I_CRD_TNS_LAST_MODIFIED_TIME |1 |
last_modified_time | A |  188908 | NULL | NULL   | YES  |
BTREE  | |
| scd_table |  1 | I_CRD_TNS_RERUN_TIME |1 |
rerun_time | A |  14 | NULL | NULL   | YES  |
BTREE  | |
| scd_table |  1 | I_CRD_TNS_STATUS |1 |
status | A |  14 | NULL | NULL   | YES  |
BTREE  | |
+---++--+--++---+-+--++--++-+

Whenever i explain the query it takes the index with low cardinality. Can I
remove all the indexes and create only 1-2 multi column index or any other
tuning that i can do for the above queries. Please let me know if any other
info is reqd. ( table schema has the same columns mentioned in select
clause ).


Thanks

Re: 回复：回复： Why is creating indexes faster after inserting massive data rows?

2012-05-30 Thread Mihamina Rakotomandimby


On 05/07/2012 12:30 PM, Zhangzhigang wrote:

Thanks, i thought about this answer in the past, and i appreciate your reply.


How about the omelet?
What's your method?

--
RMA.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

RE: 回复：回复： Why is creating indexes faster after inserting massive data rows?

2012-05-10 Thread Rick James

One more wrinkle...

When adding a UNIQUE index, MySQL must build the BTree, and cannot do a 
sortmerge.  (I think this is a true statement, but I am not positive.)  The 
reason...

A UNIQUE index is two things:  an INDEX, and a UNUQUEness constraint.  In 
order to enforce the constraint, it must check each record as it is inserted 
into the table.  This means that the index must exist, and contain all the row 
that have been added so far.

Conclusion:  Do not make an INDEX UNIQUE unless you really need it.  Example:
PRIMARY KEY (a,b)
UNIQUE (b,a) -- The UNIQUEness constraint here adds nothing useful; make it 
just INDEX.

There are still more wrinkles...
* InnoDB benefits from inserting rows in the PRIMARY KEY order.
* INSERT ... (SELECT ... ORDER BY ...) -- Sometimes it is useful to do a 
sortmerge in the SELECT in order to make the INSERT more efficient.
* ALTER TABLE ... ORDER BY ... -- For MyISAM (not for InnoDB), this can 
cluster the rows to make certain SELECTs do fewer disk hits.

Massive data warehouse fact tables often should have nothing but a PRIMARY 
KEY.  All big SELECTs should hit summary tables that aggregate data to make 
reports more efficient.  (I have seen 10x to 1000x performance improvement.)  
Should we discuss this?


 -Original Message-
 From: Karen Abgarian [mailto:a...@apple.com]
 Sent: Monday, May 07, 2012 8:37 PM
 To: mysql@lists.mysql.com
 Subject: Re: 回复： 回复： Why is creating indexes faster after
 inserting massive data rows?
 
 Honestly, I did not understand that.   I did not say anything about
 being complicated.  What does mysql not use, caching??
 
 Judging by experience, creating a unique index on say, a 200G table
 could be a bitter one.
 
 
 On 07.05.2012, at 19:26, Zhangzhigang wrote:
 
  Karen...
 
  The mysql does not use this approach what you said which is
 complicated.
 
  I  agree with ohan De Meersman.
 
 
  
  发件人： Karen Abgarian a...@apple.com
  收件人： mysql@lists.mysql.com
  发送日期： 2012年5月8日, 星期二, 上午 1:30
  主题: Re: 回复： Why is creating indexes faster after inserting
 massive data rows?
 
  Hi,
 
  A couple cents to this.
 
  There isn't really a million of block writes.   The record gets added
 to the block, but that gets modified in OS cache if we assume MyISAM
 tables and in the Innodb buffer if we assume InnoDB tables.   In both
 cases, the actual writing does not take place and does not slow down
 the process.What does however happen for each operation, is
 processing the statement, locating the entries to update in the index,
 index block splits and , for good reason, committing.
 
  When it comes to creating an index, what needs to happen, is to read
 the whole table and to sort all rows by the index key.   The latter
 process will be the most determining factor in answering the original
 question, because for the large tables the sort will have to do a lot
 of disk I/O.The point I am trying to make is there will be
 situations when creating indexes and then inserting the rows will be
 faster than creating an index afterwards.   If we try to determine such
 situations, we could notice that the likelihood of the sort going to
 disk increases with the amount of distinct values to be sorted.   For
 this reason, my choice would be to create things like primary/unique
 keys beforehand unless I am certain that everything will fit in the
 available memory.
 
  Peace
  Karen
 
 
 
  On May 7, 2012, at 8:05 AM, Johan De Meersman wrote:
 
  - Original Message -
 
  From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
  Ok, Creating the index *after* the inserts, the index gets created
  in a single operation.
  But the indexes has to be updating row by row after the data rows
  has all been inserted. Does it work in this way?
  No, when you create an index on an existing table (like after a mass
 insert), what happens is that the engine does a single full tablescan
 and builds the index in a single pass, which is a lot more performant
 than updating a single disk block for every record, for the simple
 reason that a single disk block can contain dozens of index entries.
 
  Imagine that you insert one million rows, and you have 100 index
 entries in a disk block (random numbers, to make a point. Real numbers
 will depend on storage, file system, index, et cetera). Obviously
 there's no way to write less than a single block to disk - that's how
 it works.
 
  You can update your index for each record in turn. That means you
 will need to do 1 million index - and thus block - writes; plus
 additional reads for those blocks you don't have in memory - that's the
 index cache.
 
  Now, if you create a new index on an existing table, you are first
 of all bypassing any index read operations - there *is* no index to
 read, yet. Then the system is going to do a full tablescan - considered
 slow, but you need all the data, so there's no better way anyway. The
 index will be built - in-memory as much as possible

RE: 回复： Why is creating indexes faster after inserting massive data rows?

2012-05-09 Thread Rick James

A BTree that is small enough to be cached in RAM can be quickly maintained.  
Even the “block splits” are not too costly without the I/O.

A big file that needs sorting – bigger than can be cached in RAM – is more 
efficiently done with a dedicated “sort merge” program.  A “big” INDEX on a 
table may be big enough to fall into this category.

I/O is the most costly part of any of these operations.  My rule of thumb for 
MySQL SQL statements is:  If everything is cached, the query will run ten times 
as fast as it would if things have to be fetched from disk.

Sortmerge works this way:

1.   Sort as much of the file as you can in RAM.  Write that sorted piece 
to disk.

2.   Repeat for the next chunk of the file.  Repeat until the input file is 
broken into sorted chunks.

3.   Now, “merge” those chunks together – take the first row from  each, 
decide which is the “smallest”, send it to the output

4.   Repeat until finished with all the pieces.
For a really big task, there may have to be more than on “merge” pass.
Note how sort merge reads the input sequentially once, writes the output 
sequentially once, and has sequential I/O for each merge chunk.
“Sequential” I/O is faster than “random” I/O – no arm motion on traditional 
disks.  (SSDs are a different matter; I won’t go into that.)

The “output” from the sortmerge is fed into code that builds the BTree for the 
table.  This building of the BTree is sequential – fill the first block, move 
on to the next block, and never have to go back.

BTrees (when built randomly), if they need to spill to disk, will involve 
random I/O.  (And we are talking about an INDEX that is so big that it needs to 
spill to disk.)

When a block “splits”, one full block becomes two half-full blocks.  Randomly 
filling a BTree leads to, on average, the index being 69% full.  This is not a 
big factor in the overall issue, but perhaps worth noting.

How bad can it get?  Here’s an example.

· You have an INDEX on some random value, such as a GUID or MD5.

· The INDEX will be 5 times as big as you can fit in RAM.

· MySQL is adding to the BTree one row at a time (the non-sortmerge way)
When it is nearly finished, only 1 of 5 updates to the BTree can be done 
immediately in RAM; 4 out of 5 updates to the BTree will have to hit disk.  If 
you are using normal disks, that is on the order of 125 rows per second that 
you can insert – Terrible!  Sortmerge is likely to average over 10,000.



From: Zhangzhigang [mailto:zzgang_2...@yahoo.com.cn]
Sent: Tuesday, May 08, 2012 9:13 PM
To: Rick James
Cc: mysql@lists.mysql.com
Subject: 回复： Why is creating indexes faster after inserting massive data rows?

James...
* By doing all the indexes after building the table (or at least all the 
non-UNIQUE indexes), sort merge can be used.  This technique had been highly 
optimized over the past half-century, and is more efficient.

I have a question about sort merge:

Why does it do the all sort merge?

In my opinion, it just maintains the B tree and inserts one key into a B tree 
node which has fewer sorted keys, so it is good performance.

If it only does the sort merge, the B tree data structure have to been 
created separately. it wastes some performance.

Does it?



发件人： Rick James rja...@yahoo-inc.commailto:rja...@yahoo-inc.com
收件人： Johan De Meersman vegiv...@tuxera.bemailto:vegiv...@tuxera.be; 
Zhangzhigang zzgang_2...@yahoo.com.cnmailto:zzgang_2...@yahoo.com.cn
抄送： mysql@lists.mysql.commailto:mysql@lists.mysql.com 
mysql@lists.mysql.commailto:mysql@lists.mysql.com
发送日期： 2012年5月8日, 星期二, 上午 12:35
主题: RE: Why is creating indexes faster after inserting massive data rows?

* Batch INSERTs run faster than one-row-at-a-time, but this is unrelated to 
INDEX updating speed.
* The cache size is quite important to dealing with indexing during INSERT; see 
http://mysql.rjweb.org/doc.php/memory http://mysql.rjweb.org/doc.php/memory%0A
* Note that mysqldump sets up for an efficient creation of indexes after 
loading the data.  This is not practical (or necessarily efficient) when 
incremental INSERTing into a table.

As for the original question...
* Updating the index(es) for one row often involves random BTree traversals.  
When the index(es) are too big to be cached, this can involve disk hit(s) for 
each row inserted.
* By doing all the indexes after building the table (or at least all the 
non-UNIQUE indexes), sort merge can be used.  This technique had been highly 
optimized over the past half-century, and is more efficient.


 -Original Message-
 From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:vegiv...@tuxera.be]
 Sent: Monday, May 07, 2012 1:29 AM
 To: Zhangzhigang
 Cc: mysql@lists.mysql.commailto:mysql@lists.mysql.com
 Subject: Re: Why is creating indexes faster after inserting massive
 data rows?

 - Original Message -
  From: Zhangzhigang 
  zzgang_2...@yahoo.com.cnmailto:zzgang_2...@yahoo.com.cn

Re: 回复： Why is creating indexes faster after inserting massive data rows?

2012-05-09 Thread Claudio Nanni

This thread is going on and on and on and on,
does anyone have time to actually measure I/O?
Let's make numbers talk.

Claudio


2012/5/9 Rick James rja...@yahoo-inc.com

 A BTree that is small enough to be cached in RAM can be quickly
 maintained.  Even the “block splits” are not too costly without the I/O.

 A big file that needs sorting �C bigger than can be cached in RAM �C is more
 efficiently done with a dedicated “sort merge” program.  A “big” INDEX on a
 table may be big enough to fall into this category.

 I/O is the most costly part of any of these operations.  My rule of thumb
 for MySQL SQL statements is:  If everything is cached, the query will run
 ten times as fast as it would if things have to be fetched from disk.

 Sortmerge works this way:

 1.   Sort as much of the file as you can in RAM.  Write that sorted
 piece to disk.

 2.   Repeat for the next chunk of the file.  Repeat until the input
 file is broken into sorted chunks.

 3.   Now, “merge” those chunks together �C take the first row from
  each, decide which is the “smallest”, send it to the output

 4.   Repeat until finished with all the pieces.
 For a really big task, there may have to be more than on “merge” pass.
 Note how sort merge reads the input sequentially once, writes the output
 sequentially once, and has sequential I/O for each merge chunk.
 “Sequential” I/O is faster than “random” I/O �C no arm motion on
 traditional disks.  (SSDs are a different matter; I won’t go into that.)

 The “output” from the sortmerge is fed into code that builds the BTree for
 the table.  This building of the BTree is sequential �C fill the first
 block, move on to the next block, and never have to go back.

 BTrees (when built randomly), if they need to spill to disk, will involve
 random I/O.  (And we are talking about an INDEX that is so big that it
 needs to spill to disk.)

 When a block “splits”, one full block becomes two half-full blocks.
  Randomly filling a BTree leads to, on average, the index being 69% full.
  This is not a big factor in the overall issue, but perhaps worth noting.

 How bad can it get?  Here’s an example.

 ・ You have an INDEX on some random value, such as a GUID or MD5.

 ・ The INDEX will be 5 times as big as you can fit in RAM.

 ・ MySQL is adding to the BTree one row at a time (the
 non-sortmerge way)
 When it is nearly finished, only 1 of 5 updates to the BTree can be done
 immediately in RAM; 4 out of 5 updates to the BTree will have to hit disk.
  If you are using normal disks, that is on the order of 125 rows per second
 that you can insert �C Terrible!  Sortmerge is likely to average over 10,000.



 From: Zhangzhigang [mailto:zzgang_2...@yahoo.com.cn]
 Sent: Tuesday, May 08, 2012 9:13 PM
 To: Rick James
 Cc: mysql@lists.mysql.com
 Subject: 回复： Why is creating indexes faster after inserting massive data
 rows?

 James...
 * By doing all the indexes after building the table (or at least all the
 non-UNIQUE indexes), sort merge can be used.  This technique had been
 highly optimized over the past half-century, and is more efficient.

 I have a question about sort merge:

 Why does it do the all sort merge?

 In my opinion, it just maintains the B tree and inserts one key into a B
 tree node which has fewer sorted keys, so it is good performance.

 If it only does the sort merge, the B tree data structure have to been
 created separately. it wastes some performance.

 Does it?


 
 发件人： Rick James rja...@yahoo-inc.commailto:rja...@yahoo-inc.com
 收件人： Johan De Meersman vegiv...@tuxera.bemailto:vegiv...@tuxera.be;
 Zhangzhigang zzgang_2...@yahoo.com.cnmailto:zzgang_2...@yahoo.com.cn
 抄送： mysql@lists.mysql.commailto:mysql@lists.mysql.com 
 mysql@lists.mysql.commailto:mysql@lists.mysql.com
 发送日期： 2012年5月8日, 星期二, 上午 12:35
 主题: RE: Why is creating indexes faster after inserting massive data rows?

 * Batch INSERTs run faster than one-row-at-a-time, but this is unrelated
 to INDEX updating speed.
 * The cache size is quite important to dealing with indexing during
 INSERT; see http://mysql.rjweb.org/doc.php/memory 
 http://mysql.rjweb.org/doc.php/memory%0A
 * Note that mysqldump sets up for an efficient creation of indexes after
 loading the data.  This is not practical (or necessarily efficient) when
 incremental INSERTing into a table.

 As for the original question...
 * Updating the index(es) for one row often involves random BTree
 traversals.  When the index(es) are too big to be cached, this can involve
 disk hit(s) for each row inserted.
 * By doing all the indexes after building the table (or at least all the
 non-UNIQUE indexes), sort merge can be used.  This technique had been
 highly optimized over the past half-century, and is more efficient.


  -Original Message-
  From: Johan De Meersman [mailto:vegiv...@tuxera.bemailto:
 vegiv...@tuxera.be]
  Sent: Monday, May 07, 2012 1:29 AM
  To: Zhangzhigang
  Cc: mysql

RE: 回复： Why is creating indexes faster after inserting massive data rows?

2012-05-09 Thread Claudio Nanni

Disagree all the way, numbers are numbers, and better than words, always.
Claudio
On May 9, 2012 7:22 PM, Rick James rja...@yahoo-inc.com wrote:

 Numbers can be misleading �C one benchmark will show no difference; another
 will show 10x difference.

 Recommend you benchmark _*your*_ case.

 ** **

 *From:* Claudio Nanni [mailto:claudio.na...@gmail.com]
 *Sent:* Wednesday, May 09, 2012 8:34 AM
 *To:* Rick James
 *Cc:* Zhangzhigang; mysql@lists.mysql.com
 *Subject:* Re: 回复： Why is creating indexes faster after inserting massive
 data rows?

 ** **

 This thread is going on and on and on and on,

 does anyone have time to actually measure I/O?

 Let's make numbers talk.

 ** **

 Claudio

 ** **

 2012/5/9 Rick James rja...@yahoo-inc.com

 A BTree that is small enough to be cached in RAM can be quickly
 maintained.  Even the “block splits” are not too costly without the I/O.

 A big file that needs sorting �C bigger than can be cached in RAM �C is more
 efficiently done with a dedicated “sort merge” program.  A “big” INDEX on a
 table may be big enough to fall into this category.

 I/O is the most costly part of any of these operations.  My rule of thumb
 for MySQL SQL statements is:  If everything is cached, the query will run
 ten times as fast as it would if things have to be fetched from disk.

 Sortmerge works this way:

 1.   Sort as much of the file as you can in RAM.  Write that sorted
 piece to disk.

 2.   Repeat for the next chunk of the file.  Repeat until the input
 file is broken into sorted chunks.

 3.   Now, “merge” those chunks together �C take the first row from
  each, decide which is the “smallest”, send it to the output

 4.   Repeat until finished with all the pieces.
 For a really big task, there may have to be more than on “merge” pass.
 Note how sort merge reads the input sequentially once, writes the output
 sequentially once, and has sequential I/O for each merge chunk.
 “Sequential” I/O is faster than “random” I/O �C no arm motion on
 traditional disks.  (SSDs are a different matter; I won’t go into that.)

 The “output” from the sortmerge is fed into code that builds the BTree for
 the table.  This building of the BTree is sequential �C fill the first
 block, move on to the next block, and never have to go back.

 BTrees (when built randomly), if they need to spill to disk, will involve
 random I/O.  (And we are talking about an INDEX that is so big that it
 needs to spill to disk.)

 When a block “splits”, one full block becomes two half-full blocks.
  Randomly filling a BTree leads to, on average, the index being 69% full.
  This is not a big factor in the overall issue, but perhaps worth noting.

 How bad can it get?  Here’s an example.

 ・ You have an INDEX on some random value, such as a GUID or MD5.

 ・ The INDEX will be 5 times as big as you can fit in RAM.

 ・ MySQL is adding to the BTree one row at a time (the
 non-sortmerge way)
 When it is nearly finished, only 1 of 5 updates to the BTree can be done
 immediately in RAM; 4 out of 5 updates to the BTree will have to hit disk.
  If you are using normal disks, that is on the order of 125 rows per second
 that you can insert �C Terrible!  Sortmerge is likely to average over 10,000.



 From: Zhangzhigang [mailto:zzgang_2...@yahoo.com.cn]
 Sent: Tuesday, May 08, 2012 9:13 PM
 To: Rick James
 Cc: mysql@lists.mysql.com
 Subject: 回复： Why is creating indexes faster after inserting massive data
 rows?


 James...
 * By doing all the indexes after building the table (or at least all the
 non-UNIQUE indexes), sort merge can be used.  This technique had been
 highly optimized over the past half-century, and is more efficient.

 I have a question about sort merge:

 Why does it do the all sort merge?

 In my opinion, it just maintains the B tree and inserts one key into a B
 tree node which has fewer sorted keys, so it is good performance.

 If it only does the sort merge, the B tree data structure have to been
 created separately. it wastes some performance.

 Does it?


 
 发件人： Rick James rja...@yahoo-inc.commailto:rja...@yahoo-inc.com
 收件人： Johan De Meersman vegiv...@tuxera.bemailto:vegiv...@tuxera.be;
 Zhangzhigang zzgang_2...@yahoo.com.cnmailto:zzgang_2...@yahoo.com.cn
 抄送： mysql@lists.mysql.commailto:mysql@lists.mysql.com 
 mysql@lists.mysql.commailto:mysql@lists.mysql.com

 发送日期： 2012年5月8日, 星期二, 上午 12:35
 主题: RE: Why is creating indexes faster after inserting massive data rows?

 * Batch INSERTs run faster than one-row-at-a-time, but this is unrelated
 to INDEX updating speed.

 * The cache size is quite important to dealing with indexing during
 INSERT; see http://mysql.rjweb.org/doc.php/memory 
 http://mysql.rjweb.org/doc.php/memory%0A

 * Note that mysqldump sets up for an efficient creation of indexes after
 loading the data.  This is not practical (or necessarily efficient) when
 incremental

回复：回复： Why is creating indexes faster after inserting massive data rows?

2012-05-09 Thread Zhangzhigang

The “output” from the sortmerge is fed into code that builds the BTree for 
the table.  This building of the BTree is sequential – fill the first 
block, move on to the next block, and never have to go back.

 James...


Thanks for your answer, so clearly.

Firstly:

I thought that the block split for building of the BTree has to been done to 
do random I/O before accepting this answer.

Now, i have known that the mysql do the optimization to keep from block split 
by sort merge for building BTree, so it does not do more random I/O.


Secondly:

It bypass BTree traversals, When the index are too big to be cached which 
involves disk hit(s)  fro each row inserted.


Thank you very much.


Sincerely yours
Zhigang Zhang



 发件人： Rick James rja...@yahoo-inc.com
收件人： Zhangzhigang zzgang_2...@yahoo.com.cn 
抄送： mysql@lists.mysql.com mysql@lists.mysql.com 
发送日期： 2012年5月9日, 星期三, 下午 11:21
主题: RE: 回复： Why is creating indexes faster after inserting massive data rows?
 

A BTree that is small enough to be cached in RAM can be quickly maintained.  
Even the “block splits” are not too costly without the I/O.
 
A big file that needs sorting – bigger than can be cached in RAM – is more 
efficiently done with a dedicated “sort merge” program.  A “big” INDEX on a 
table may be big enough to fall into this category.
 
I/O is the most costly part of any of these operations.  My rule of thumb for 
MySQL SQL statements is:  If everything is cached, the query will run ten times 
as fast as it would if things have to be fetched from disk.
 
Sortmerge works this way:
1.   Sort as much of the file as you can in RAM.  Write that sorted piece 
to disk.
2.   Repeat for the next chunk of the file.  Repeat until the input file is 
broken into sorted chunks.
3.   Now, “merge” those chunks together – take the first row from  each, 
decide which is the “smallest”, send it to the output
4.   Repeat until finished with all the pieces.
For a really big task, there may have to be more than on “merge” pass.
Note how sort merge reads the input sequentially once, writes the output 
sequentially once, and has sequential I/O for each merge chunk.
“Sequential” I/O is faster than “random” I/O – no arm motion on traditional 
disks.  (SSDs are a different matter; I won’t go into that.)
 
The “output” from the sortmerge is fed into code that builds the BTree for the 
table.  This building of the BTree is sequential – fill the first block, move 
on to the next block, and never have to go back.
 
BTrees (when built randomly), if they need to spill to disk, will involve 
random I/O.  (And we are talking about an INDEX that is so big that it needs to 
spill to disk.)
 
When a block “splits”, one full block becomes two half-full blocks.  Randomly 
filling a BTree leads to, on average, the index being 69% full.  This is not a 
big factor in the overall issue, but perhaps worth noting.
 
How bad can it get?  Here’s an example.
· You have an INDEX on some random value, such as a GUID or MD5.
· The INDEX will be 5 times as big as you can fit in RAM.
· MySQL is adding to the BTree one row at a time (the non-sortmerge way)
When it is nearly finished, only 1 of 5 updates to the BTree can be done 
immediately in RAM; 4 out of 5 updates to the BTree will have to hit disk.  If 
you are using normal disks, that is on the order of 125 rows per second that 
you can insert – Terrible!  Sortmerge is likely to average over 10,000.
 
 
 
From:Zhangzhigang [mailto:zzgang_2...@yahoo.com.cn] 
Sent: Tuesday, May 08, 2012 9:13 PM
To: Rick James
Cc: mysql@lists.mysql.com
Subject: 回复：Why is creating indexes faster after inserting massive data rows?
 
James...
* By doing all the indexes after building the table (or at least all the 
non-UNIQUE indexes), sort merge can be used.  This technique had been highly 
optimized over the past half-century, and is more efficient.
 
I have a question about sort merge:
 
Why does it do the all sort merge? 
 
In my opinion, it just maintains the B tree and inserts one key into a B tree 
node which has fewer sorted keys, so it is good performance.
 
If it only does the sort merge, the B tree data structure have to been 
created separately. it wastes some performance.
 
Does it?
 
 



发件人：Rick James rja...@yahoo-inc.com
收件人：Johan De Meersman vegiv...@tuxera.be; Zhangzhigang 
zzgang_2...@yahoo.com.cn 
抄送：mysql@lists.mysql.com mysql@lists.mysql.com 
发送日期：2012年5月8日, 星期二, 上午12:35
主题:RE: Why is creating indexes faster after inserting massive data rows?

* Batch INSERTs run faster than one-row-at-a-time, but this is unrelated to 
INDEX updating speed.
* The cache size is quite important to dealing with indexing during INSERT; see 
http://mysql.rjweb.org/doc.php/memory 
* Note that mysqldump sets up for an efficient creation of indexes after 
loading the data.  This is not practical (or necessarily efficient) when 
incremental INSERTing into a table

回复：回复：回复： Why is creating indexes faster after inserting massive data rows?

Ok, OS cache.
 There isn't really a million of block writes.   The record gets 
added to the block, but that gets modified in OS cache if we assume 
MyISAM tables and in the Innodb buffer if we assume InnoDB tables.

As i known, the mysql writes the data to disk directly but does not use the Os 
cache when the table is updating.

If it writes to the Os cache, which leads to massive system invoking, when the 
table is inserted a lot of rows one by one. 





 发件人： Karen Abgarian a...@apple.com
收件人： mysql@lists.mysql.com 
发送日期： 2012年5月8日, 星期二, 上午 11:37
主题: Re: 回复： 回复： Why is creating indexes faster after inserting massive data 
rows?
 
Honestly, I did not understand that.   I did not say anything about being 
complicated.  What does mysql not use, caching??

Judging by experience, creating a unique index on say, a 200G table could be a 
bitter one.  


On 07.05.2012, at 19:26, Zhangzhigang wrote:

 Karen...
 
 The mysql does not use this approach what you said which is complicated.
 
 I  agree with ohan De Meersman.
 
 
 
 发件人： Karen Abgarian a...@apple.com
 收件人： mysql@lists.mysql.com 
 发送日期： 2012年5月8日, 星期二, 上午 1:30
 主题: Re: 回复： Why is creating indexes faster after inserting massive data rows?
 
 Hi, 
 
 A couple cents to this. 
 
 There isn't really a million of block writes.   The record gets added to the 
 block, but that gets modified in OS cache if we assume MyISAM tables and in 
 the Innodb buffer if we assume InnoDB tables.   In both cases, the actual 
 writing does not take place and does not slow down the process.    What does 
 however happen for each operation, is processing the statement, locating the 
 entries to update in the index, index block splits and , for good reason, 
 committing.  
 
 When it comes to creating an index, what needs to happen, is to read the 
 whole table and to sort all rows by the index key.   The latter process will 
 be the most determining factor in answering the original question, because 
 for the large tables the sort will have to do a lot of disk I/O.    The point 
 I am trying to make is there will be situations when creating indexes and 
 then inserting the rows will be faster than creating an index afterwards.   
 If we try to determine such situations, we could notice that the likelihood 
 of the sort going to disk increases with the amount of distinct values to be 
 sorted.   For this reason, my choice would be to create things like 
 primary/unique keys beforehand unless I am certain that everything will fit 
 in the available memory. 
 
 Peace
 Karen
 
 
 
 On May 7, 2012, at 8:05 AM, Johan De Meersman wrote:
 
 - Original Message -
 
 From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
 Ok, Creating the index *after* the inserts, the index gets created in
 a single operation.
 But the indexes has to be updating row by row after the data rows has
 all been inserted. Does it work in this way?
 No, when you create an index on an existing table (like after a mass 
 insert), what happens is that the engine does a single full tablescan and 
 builds the index in a single pass, which is a lot more performant than 
 updating a single disk block for every record, for the simple reason that a 
 single disk block can contain dozens of index entries. 
 
 Imagine that you insert one million rows, and you have 100 index entries in 
 a disk block (random numbers, to make a point. Real numbers will depend on 
 storage, file system, index, et cetera). Obviously there's no way to write 
 less than a single block to disk - that's how it works. 
 
 You can update your index for each record in turn. That means you will need 
 to do 1 million index - and thus block - writes; plus additional reads for 
 those blocks you don't have in memory - that's the index cache. 
 
 Now, if you create a new index on an existing table, you are first of all 
 bypassing any index read operations - there *is* no index to read, yet. Then 
 the system is going to do a full tablescan - considered slow, but you need 
 all the data, so there's no better way anyway. The index will be built - 
 in-memory as much as possible - and the system will automatically prefer to 
 write only complete blocks - 10.000 of them. That's the exact same number of 
 index blocks, but you only write each block once, so that's only 10.000 
 writes instead of 1.000.000. 
 
 Now there's a lot more at play, things like B-tree balancing and whatnot, 
 but that's the basic picture. 
 
 -- 
 
 Bier met grenadyn 
 Is als mosterd by den wyn 
 Sy die't drinkt, is eene kwezel 
 Hy die't drinkt, is ras een ezel 
 
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:    http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Re: 回复：回复： Why is creating indexes faster after inserting massive data rows?

2012-05-08 Thread Johan De Meersman

- Original Message -
 From: Zhangzhigang zzgang_2...@yahoo.com.cn

 The mysql does not use this approach what you said which is
 complicated.

 I  agree with ohan De Meersman.

Umm... It's not a matter of who you agree with :-) Karen's technical detail is 
quite correct; I merely presented a simplified picture for easier understanding 
of the basics.


--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: 回复：回复：回复： Why is creating indexes faster after inserting massive data rows?

2012-05-08 Thread Johan De Meersman

- Original Message -
 From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
 As i known, the mysql writes the data to disk directly but does not
 use the Os cache when the table is updating.

If it were to use the OS cache for reading but not writing, then the OS cache 
would be inconsistent with the underlying filesystem as soon as you wrote a 
block, and you'd need some complicated logic to figure out which of the two was 
correct.

No, the MyISAM engine will simply yield to whatever the kernel/VFS wants to do 
with the blocks; whereas InnoDB explicitly opens the files with O_SYNC and 
bypasses the OS cache entirely, because it manages it's own buffer cache.

 If it writes to the Os cache, which leads to massive system invoking,
 when the table is inserted a lot of rows one by one.

From the code's point of view, you simply request a read or a write. Wether or 
not the OS cache gets in between is entirely a matter for the kernel to 
decide, assuming you specified no specific options at file open time.


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

回复：回复：回复：回复： Why is creating indexes faster after inserting massive data rows?

Ok, thanks for your help.




 发件人： Johan De Meersman vegiv...@tuxera.be
收件人： Zhangzhigang zzgang_2...@yahoo.com.cn 
抄送： mysql@lists.mysql.com; Karen Abgarian a...@apple.com 
发送日期： 2012年5月8日, 星期二, 下午 6:07
主题: Re: 回复： 回复： 回复： Why is creating indexes faster after inserting massive data 
rows?
 
- Original Message -
 From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
 As i known, the mysql writes the data to disk directly but does not
 use the Os cache when the table is updating.

If it were to use the OS cache for reading but not writing, then the OS cache 
would be inconsistent with the underlying filesystem as soon as you wrote a 
block, and you'd need some complicated logic to figure out which of the two was 
correct.

No, the MyISAM engine will simply yield to whatever the kernel/VFS wants to do 
with the blocks; whereas InnoDB explicitly opens the files with O_SYNC and 
bypasses the OS cache entirely, because it manages it's own buffer cache.

 If it writes to the Os cache, which leads to massive system invoking,
 when the table is inserted a lot of rows one by one.

From the code's point of view, you simply request a read or a write. Wether or 
not the OS cache gets in between is entirely a matter for the kernel to decide, 
assuming you specified no specific options at file open time.


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Re: 回复：回复：回复： Why is creating indexes faster after inserting massive data rows?

2012-05-08 Thread Karen Abgarian

Hi, 

If MyISAM tables were being written directly to disk, the MyISAM tables would 
be so slow that nobody would ever use them.That's the cornerstone of their 
performance, that the writes do not wait for the physical I/O to complete!



On May 8, 2012, at 3:07 AM, Johan De Meersman wrote:

 - Original Message -
 From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
 As i known, the mysql writes the data to disk directly but does not
 use the Os cache when the table is updating.
 
 If it were to use the OS cache for reading but not writing, then the OS cache 
 would be inconsistent with the underlying filesystem as soon as you wrote a 
 block, and you'd need some complicated logic to figure out which of the two 
 was correct.
 
 No, the MyISAM engine will simply yield to whatever the kernel/VFS wants to 
 do with the blocks; whereas InnoDB explicitly opens the files with O_SYNC and 
 bypasses the OS cache entirely, because it manages it's own buffer cache.
 
 If it writes to the Os cache, which leads to massive system invoking,
 when the table is inserted a lot of rows one by one.
 
 From the code's point of view, you simply request a read or a write. Wether 
 or not the OS cache gets in between is entirely a matter for the kernel to 
 decide, assuming you specified no specific options at file open time.
 
 
 -- 
 Bier met grenadyn
 Is als mosterd by den wyn
 Sy die't drinkt, is eene kwezel
 Hy die't drinkt, is ras een ezel
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql
 


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

回复：回复：回复：回复： Why is creating indexes faster after inserting massive data rows?

 Oh... I thought that it uses it's own buffer cache as same as the InnoDB. I 
have got a mistake for this,  thanks!




 发件人： Karen Abgarian a...@apple.com
收件人： mysql@lists.mysql.com 
发送日期： 2012年5月9日, 星期三, 上午 2:51
主题: Re: 回复： 回复： 回复： Why is creating indexes faster after inserting massive data 
rows?
 
Hi, 

If MyISAM tables were being written directly to disk, the MyISAM tables would 
be so slow that nobody would ever use them.    That's the cornerstone of their 
performance, that the writes do not wait for the physical I/O to complete!



On May 8, 2012, at 3:07 AM, Johan De Meersman wrote:

 - Original Message -
 From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
 As i known, the mysql writes the data to disk directly but does not
 use the Os cache when the table is updating.
 
 If it were to use the OS cache for reading but not writing, then the OS cache 
 would be inconsistent with the underlying filesystem as soon as you wrote a 
 block, and you'd need some complicated logic to figure out which of the two 
 was correct.
 
 No, the MyISAM engine will simply yield to whatever the kernel/VFS wants to 
 do with the blocks; whereas InnoDB explicitly opens the files with O_SYNC and 
 bypasses the OS cache entirely, because it manages it's own buffer cache.
 
 If it writes to the Os cache, which leads to massive system invoking,
 when the table is inserted a lot of rows one by one.
 
 From the code's point of view, you simply request a read or a write. Wether 
 or not the OS cache gets in between is entirely a matter for the kernel to 
 decide, assuming you specified no specific options at file open time.
 
 
 -- 
 Bier met grenadyn
 Is als mosterd by den wyn
 Sy die't drinkt, is eene kwezel
 Hy die't drinkt, is ras een ezel
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:    http://lists.mysql.com/mysql
 


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

回复： Why is creating indexes faster after inserting massive data rows?

James...
* By doing all the indexes after building the table (or at least all the 
non-UNIQUE indexes), sort merge can be used.  This technique had been highly 
optimized over the past half-century, and is more efficient.


I have a question about sort merge:

Why does it do the all sort merge? 


In my opinion, it just maintains the B tree and inserts one key into a B tree 
node which has fewer sorted keys, so it is good performance.

If it only does the sort merge, the B tree data structure have to been 
createdseparately. it wastes some performance.

Does it?




 发件人： Rick James rja...@yahoo-inc.com
收件人： Johan De Meersman vegiv...@tuxera.be; Zhangzhigang 
zzgang_2...@yahoo.com.cn 
抄送： mysql@lists.mysql.com mysql@lists.mysql.com 
发送日期： 2012年5月8日, 星期二, 上午 12:35
主题: RE: Why is creating indexes faster after inserting massive data rows?
 
* Batch INSERTs run faster than one-row-at-a-time, but this is unrelated to 
INDEX updating speed.
* The cache size is quite important to dealing with indexing during INSERT; see 
http://mysql.rjweb.org/doc.php/memory 
* Note that mysqldump sets up for an efficient creation of indexes after 
loading the data.  This is not practical (or necessarily efficient) when 
incremental INSERTing into a table.

As for the original question...
* Updating the index(es) for one row often involves random BTree traversals.  
When the index(es) are too big to be cached, this can involve disk hit(s) for 
each row inserted.
* By doing all the indexes after building the table (or at least all the 
non-UNIQUE indexes), sort merge can be used.  This technique had been highly 
optimized over the past half-century, and is more efficient.


 -Original Message-
 From: Johan De Meersman [mailto:vegiv...@tuxera.be]
 Sent: Monday, May 07, 2012 1:29 AM
 To: Zhangzhigang
 Cc: mysql@lists.mysql.com
 Subject: Re: Why is creating indexes faster after inserting massive
 data rows?
 
 - Original Message -
  From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
  Creating indexes after inserting massive data rows is faster than
  before inserting data rows.
  Please tell me why.
 
 Plain and simple: the indices get updated after every insert statement,
 whereas if you only create the index *after* the inserts, the index
 gets created in a single operation, which is a lot more efficient.
 
 I seem to recall that inside of a transaction (thus, InnoDB or so) the
 difference is markedly less; I might be wrong, though.
 
 
 --
 Bier met grenadyn
 Is als mosterd by den wyn
 Sy die't drinkt, is eene kwezel
 Hy die't drinkt, is ras een ezel
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql 
 To unsubscribe:    http://lists.mysql.com/mysql

Re: Why is creating indexes faster after inserting massive data rows?

2012-05-07 Thread Ananda Kumar

which version of mysql  are you using.

Is this secondary index.?



On Mon, May 7, 2012 at 12:07 PM, Zhangzhigang zzgang_2...@yahoo.com.cnwrote:

 hi all:

 I have a question:

 Creating indexes after inserting massive data rows is faster than before
 inserting data rows.
 Please tell me why.

回复： Why is creating indexes faster after inserting massive data rows?

Version : Mysql 5.1 

Engine : MyISAM.

The indexes  are normal but neither primary key or unique key.

I should describe mysql question clearly.

When inserting massive data rows to table which need to be created indexes, i 
can create indexes before inserting data rows, anther way is that i can insert 
all data rows firstly and then create indexes. Normally, the sum using 
time(inserting data rows and creating indexes) of first way is longer than the 
second way.

Please tell me why?








 
发件人： Ananda Kumar anan...@gmail.com
收件人： Zhangzhigang zzgang_2...@yahoo.com.cn 
抄送： mysql@lists.mysql.com mysql@lists.mysql.com 
发送日期： 2012年5月7日, 星期一, 下午 3:31
主题: Re: Why is creating indexes faster after inserting massive data rows?
 

which version of mysql 燼re you using.

Is this secondary index.?




On Mon, May 7, 2012 at 12:07 PM, Zhangzhigang zzgang_2...@yahoo.com.cn wrote:

hi all:

I have a question:

Creating indexes after inserting massive data rows is faster than before 
inserting data rows.
Please tell me why.

Re: Why is creating indexes faster after inserting massive data rows?

2012-05-07 Thread Johan De Meersman

- Original Message -
 From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
 Creating indexes after inserting massive data rows is faster than
 before inserting data rows.
 Please tell me why.

Plain and simple: the indices get updated after every insert statement, whereas 
if you only create the index *after* the inserts, the index gets created in a 
single operation, which is a lot more efficient.

I seem to recall that inside of a transaction (thus, InnoDB or so) the 
difference is markedly less; I might be wrong, though.


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

回复： Why is creating indexes faster after inserting massive data rows?

johan 
Plain and simple: the indices get updated after every insert statement, 
whereas if you only create the index *after* the inserts, the index gets 
created in a single operation, which is a lot more efficient..


Ok, Creating the index *after* the inserts, the index gets created in a single 
operation.
But the indexes has to be updating row by row after the data rows has all been 
inserted. Does it work in this way?
So i can not find the different overhead  about two ways.






 发件人： Johan De Meersman vegiv...@tuxera.be
收件人： Zhangzhigang zzgang_2...@yahoo.com.cn 
抄送： mysql@lists.mysql.com 
发送日期： 2012年5月7日, 星期一, 下午 4:28
主题: Re: Why is creating indexes faster after inserting massive data rows?
 
- Original Message -
 From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
 Creating indexes after inserting massive data rows is faster than
 before inserting data rows.
 Please tell me why.

Plain and simple: the indices get updated after every insert statement, whereas 
if you only create the index *after* the inserts, the index gets created in a 
single operation, which is a lot more efficient.

I seem to recall that inside of a transaction (thus, InnoDB or so) the 
difference is markedly less; I might be wrong, though.


-- 
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

Re: 回复： Why is creating indexes faster after inserting massive data rows?

2012-05-07 Thread Alex Schaft


On 2012/05/07 10:53, Zhangzhigang wrote:

johan 

Plain and simple: the indices get updated after every insert statement,

whereas if you only create the index *after* the inserts, the index gets 
created in a single operation, which is a lot more efficient..


Ok, Creating the index *after* the inserts, the index gets created in a single 
operation.
But the indexes has to be updating row by row after the data rows has all been 
inserted. Does it work in this way?
So i can not find the different overhead  about two ways.
My simplified 2c. When inserting rows with active indexes one by one 
(insert), mysql has to


1) lock the space for the data to be added,
2) write the data,
3) lock the index,
4) write the index key(s),
5) unlock the index,
6)unlock the data

This happens for each row

When first doing all data without index, only 1, 2, and 6 happen. When 
you then create an index, it can lock the index, read all the data and 
write all index keys in one go and then unlock the index.


If you make an omelet, do you fetch your eggs from the fridge one by 
one, or all at the same time? :)


HTH,
Alex


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: 回复： Why is creating indexes faster after inserting massive data rows?

2012-05-07 Thread Claudio Nanni

Creating the index in one time is one macro-sort operation,
updating the index at every row is doing the operation on and on again.
If you do not understand the difference I recommend you to read some basics
about sorting algorithms,
very interesting read anyway.

Claudio

2012/5/7 Zhangzhigang zzgang_2...@yahoo.com.cn

 johan 
 Plain and simple: the indices get updated after every insert statement,
 whereas if you only create the index *after* the inserts, the index gets
 created in a single operation, which is a lot more efficient..


 Ok, Creating the index *after* the inserts, the index gets created in a
 single operation.
 But the indexes has to be updating row by row after the data rows has all
 been inserted. Does it work in this way?
 So i can not find the different overhead  about two ways.





 
  发件人： Johan De Meersman vegiv...@tuxera.be
 收件人： Zhangzhigang zzgang_2...@yahoo.com.cn
 抄送： mysql@lists.mysql.com
 发送日期： 2012年5月7日, 星期一, 下午 4:28
 主题: Re: Why is creating indexes faster after inserting massive data rows?

 - Original Message -
  From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
  Creating indexes after inserting massive data rows is faster than
  before inserting data rows.
  Please tell me why.

 Plain and simple: the indices get updated after every insert statement,
 whereas if you only create the index *after* the inserts, the index gets
 created in a single operation, which is a lot more efficient.

 I seem to recall that inside of a transaction (thus, InnoDB or so) the
 difference is markedly less; I might be wrong, though.


 --
 Bier met grenadyn
 Is als mosterd by den wyn
 Sy die't drinkt, is eene kwezel
 Hy die't drinkt, is ras een ezel




-- 
Claudio

回复：回复： Why is creating indexes faster after inserting massive data rows?

Ok, but my opinion is that the sorting algorithms is not impact this 
difference, two ways  all do B+ tree inserts.



 发件人： Claudio Nanni claudio.na...@gmail.com
收件人： Zhangzhigang zzgang_2...@yahoo.com.cn 
抄送： Johan De Meersman vegiv...@tuxera.be; mysql@lists.mysql.com 
mysql@lists.mysql.com 
发送日期： 2012年5月7日, 星期一, 下午 5:01
主题: Re: 回复： Why is creating indexes faster after inserting massive data rows?
 

Creating the index in one time is one macro-sort operation,
updating the index at every row is doing the operation on and on again.
If you do not understand the difference I recommend you to read some basics 
about sorting algorithms,
very interesting read anyway.

Claudio 


2012/5/7 Zhangzhigang zzgang_2...@yahoo.com.cn

johan 

Plain and simple: the indices get updated after every insert statement,
whereas if you only create the index *after* the inserts, the index gets 
created in a single operation, which is a lot more efficient..


Ok, Creating the index *after* the inserts, the index gets created in a single 
operation.
But the indexes has to be updating row by row after the data rows has all been 
inserted. Does it work in this way?
So i can not find the different overhead  about two ways.






 发件人： Johan De Meersman vegiv...@tuxera.be

收件人： Zhangzhigang zzgang_2...@yahoo.com.cn
抄送： mysql@lists.mysql.com
发送日期： 2012年5月7日, 星期一, 下午 4:28

主题: Re: Why is creating indexes faster after inserting massive data rows?


- Original Message -
 From: Zhangzhigang zzgang_2...@yahoo.com.cn

 Creating indexes after inserting massive data rows is faster than
 before inserting data rows.
 Please tell me why.

Plain and simple: the indices get updated after every insert statement, 
whereas if you only create the index *after* the inserts, the index gets 
created in a single operation, which is a lot more efficient.

I seem to recall that inside of a transaction (thus, InnoDB or so) the 
difference is markedly less; I might be wrong, though.


--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel


-- 
Claudio

回复：回复： Why is creating indexes faster after inserting massive data rows?

Thanks, i thought about this answer in the past, and i appreciate your reply.




 发件人： Alex Schaft al...@quicksoftware.co.za
收件人： mysql@lists.mysql.com 
发送日期： 2012年5月7日, 星期一, 下午 4:59
主题: Re: 回复： Why is creating indexes faster after inserting massive data rows?
 
On 2012/05/07 10:53, Zhangzhigang wrote:
 johan 
 Plain and simple: the indices get updated after every insert statement,
 whereas if you only create the index *after* the inserts, the index gets 
 created in a single operation, which is a lot more efficient..
 
 
 Ok, Creating the index *after* the inserts, the index gets created in a 
 single operation.
 But the indexes has to be updating row by row after the data rows has all 
 been inserted. Does it work in this way?
 So i can not find the different overhead  about two ways.
My simplified 2c. When inserting rows with active indexes one by one (insert), 
mysql has to

1) lock the space for the data to be added,
2) write the data,
3) lock the index,
4) write the index key(s),
5) unlock the index,
6)unlock the data

This happens for each row

When first doing all data without index, only 1, 2, and 6 happen. When you then 
create an index, it can lock the index, read all the data and write all index 
keys in one go and then unlock the index.

If you make an omelet, do you fetch your eggs from the fridge one by one, or 
all at the same time? :)

HTH,
Alex


-- MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Re: 回复：回复： Why is creating indexes faster after inserting massive data rows?

2012-05-07 Thread Claudio Nanni

too nice not to share it!

http://www.youtube.com/watch?v=INHF_5RIxTE



2012/5/7 Zhangzhigang zzgang_2...@yahoo.com.cn

 Thanks, i thought about this answer in the past, and i appreciate your
 reply.



 
  发件人： Alex Schaft al...@quicksoftware.co.za
 收件人： mysql@lists.mysql.com
 发送日期： 2012年5月7日, 星期一, 下午 4:59
 主题: Re: 回复： Why is creating indexes faster after inserting massive data
 rows?

 On 2012/05/07 10:53, Zhangzhigang wrote:
  johan 
  Plain and simple: the indices get updated after every insert statement,
  whereas if you only create the index *after* the inserts, the index gets
 created in a single operation, which is a lot more efficient..
 
 
  Ok, Creating the index *after* the inserts, the index gets created in a
 single operation.
  But the indexes has to be updating row by row after the data rows has
 all been inserted. Does it work in this way?
  So i can not find the different overhead  about two ways.
 My simplified 2c. When inserting rows with active indexes one by one
 (insert), mysql has to

 1) lock the space for the data to be added,
 2) write the data,
 3) lock the index,
 4) write the index key(s),
 5) unlock the index,
 6)unlock the data

 This happens for each row

 When first doing all data without index, only 1, 2, and 6 happen. When you
 then create an index, it can lock the index, read all the data and write
 all index keys in one go and then unlock the index.

 If you make an omelet, do you fetch your eggs from the fridge one by one,
 or all at the same time? :)

 HTH,
 Alex


 -- MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql




-- 
Claudio

Re: 回复： Why is creating indexes faster after inserting massive data rows?

2012-05-07 Thread Johan De Meersman

- Original Message -

 From: Zhangzhigang zzgang_2...@yahoo.com.cn

 Ok, Creating the index *after* the inserts, the index gets created in
 a single operation.
 But the indexes has to be updating row by row after the data rows has
 all been inserted. Does it work in this way?
No, when you create an index on an existing table (like after a mass insert), 
what happens is that the engine does a single full tablescan and builds the 
index in a single pass, which is a lot more performant than updating a single 
disk block for every record, for the simple reason that a single disk block can 
contain dozens of index entries. 

Imagine that you insert one million rows, and you have 100 index entries in a 
disk block (random numbers, to make a point. Real numbers will depend on 
storage, file system, index, et cetera). Obviously there's no way to write less 
than a single block to disk - that's how it works. 

You can update your index for each record in turn. That means you will need to 
do 1 million index - and thus block - writes; plus additional reads for those 
blocks you don't have in memory - that's the index cache. 

Now, if you create a new index on an existing table, you are first of all 
bypassing any index read operations - there *is* no index to read, yet. Then 
the system is going to do a full tablescan - considered slow, but you need all 
the data, so there's no better way anyway. The index will be built - in-memory 
as much as possible - and the system will automatically prefer to write only 
complete blocks - 10.000 of them. That's the exact same number of index blocks, 
but you only write each block once, so that's only 10.000 writes instead of 
1.000.000. 

Now there's a lot more at play, things like B-tree balancing and whatnot, but 
that's the basic picture. 

-- 

Bier met grenadyn 
Is als mosterd by den wyn 
Sy die't drinkt, is eene kwezel 
Hy die't drinkt, is ras een ezel

RE: Why is creating indexes faster after inserting massive data rows?

2012-05-07 Thread Rick James

* Batch INSERTs run faster than one-row-at-a-time, but this is unrelated to 
INDEX updating speed.
* The cache size is quite important to dealing with indexing during INSERT; see 
http://mysql.rjweb.org/doc.php/memory
* Note that mysqldump sets up for an efficient creation of indexes after 
loading the data.  This is not practical (or necessarily efficient) when 
incremental INSERTing into a table.

As for the original question...
* Updating the index(es) for one row often involves random BTree traversals.  
When the index(es) are too big to be cached, this can involve disk hit(s) for 
each row inserted.
* By doing all the indexes after building the table (or at least all the 
non-UNIQUE indexes), sort merge can be used.  This technique had been highly 
optimized over the past half-century, and is more efficient.


 -Original Message-
 From: Johan De Meersman [mailto:vegiv...@tuxera.be]
 Sent: Monday, May 07, 2012 1:29 AM
 To: Zhangzhigang
 Cc: mysql@lists.mysql.com
 Subject: Re: Why is creating indexes faster after inserting massive
 data rows?
 
 - Original Message -
  From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
  Creating indexes after inserting massive data rows is faster than
  before inserting data rows.
  Please tell me why.
 
 Plain and simple: the indices get updated after every insert statement,
 whereas if you only create the index *after* the inserts, the index
 gets created in a single operation, which is a lot more efficient.
 
 I seem to recall that inside of a transaction (thus, InnoDB or so) the
 difference is markedly less; I might be wrong, though.
 
 
 --
 Bier met grenadyn
 Is als mosterd by den wyn
 Sy die't drinkt, is eene kwezel
 Hy die't drinkt, is ras een ezel
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql

RE: 回复： Why is creating indexes faster after inserting massive data rows?

2012-05-07 Thread Rick James

As a side note, TokuDB uses what it calls fractal technology to somewhat 
improve the performance of incremental INDEXing.  They delay some of the BTree 
work so that they can better batch stuff.  While waiting for that to finish, 
queries are smart enough to look in more than one place for the index info.

InnoDB does something similar, but it is limited to the size of the buffer_pool.

 -Original Message-
 From: Johan De Meersman [mailto:vegiv...@tuxera.be]
 Sent: Monday, May 07, 2012 8:06 AM
 To: Zhangzhigang
 Cc: mysql@lists.mysql.com
 Subject: Re: 回复： Why is creating indexes faster after inserting
 massive data rows?
 
 - Original Message -
 
  From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
  Ok, Creating the index *after* the inserts, the index gets created in
  a single operation.
  But the indexes has to be updating row by row after the data rows has
  all been inserted. Does it work in this way?
 No, when you create an index on an existing table (like after a mass
 insert), what happens is that the engine does a single full tablescan
 and builds the index in a single pass, which is a lot more performant
 than updating a single disk block for every record, for the simple
 reason that a single disk block can contain dozens of index entries.
 
 Imagine that you insert one million rows, and you have 100 index
 entries in a disk block (random numbers, to make a point. Real numbers
 will depend on storage, file system, index, et cetera). Obviously
 there's no way to write less than a single block to disk - that's how
 it works.
 
 You can update your index for each record in turn. That means you will
 need to do 1 million index - and thus block - writes; plus additional
 reads for those blocks you don't have in memory - that's the index
 cache.
 
 Now, if you create a new index on an existing table, you are first of
 all bypassing any index read operations - there *is* no index to read,
 yet. Then the system is going to do a full tablescan - considered slow,
 but you need all the data, so there's no better way anyway. The index
 will be built - in-memory as much as possible - and the system will
 automatically prefer to write only complete blocks - 10.000 of them.
 That's the exact same number of index blocks, but you only write each
 block once, so that's only 10.000 writes instead of 1.000.000.
 
 Now there's a lot more at play, things like B-tree balancing and
 whatnot, but that's the basic picture.
 
 --
 
 Bier met grenadyn
 Is als mosterd by den wyn
 Sy die't drinkt, is eene kwezel
 Hy die't drinkt, is ras een ezel

Re: 回复： Why is creating indexes faster after inserting massive data rows?

2012-05-07 Thread Karen Abgarian

Hi, 

A couple cents to this. 

There isn't really a million of block writes.   The record gets added to the 
block, but that gets modified in OS cache if we assume MyISAM tables and in the 
Innodb buffer if we assume InnoDB tables.   In both cases, the actual writing 
does not take place and does not slow down the process.What does however 
happen for each operation, is processing the statement, locating the entries to 
update in the index, index block splits and , for good reason, committing.   

When it comes to creating an index, what needs to happen, is to read the whole 
table and to sort all rows by the index key.   The latter process will be the 
most determining factor in answering the original question, because for the 
large tables the sort will have to do a lot of disk I/O.The point I am 
trying to make is there will be situations when creating indexes and then 
inserting the rows will be faster than creating an index afterwards.   If we 
try to determine such situations, we could notice that the likelihood of the 
sort going to disk increases with the amount of distinct values to be sorted.   
For this reason, my choice would be to create things like primary/unique keys 
beforehand unless I am certain that everything will fit in the available 
memory. 

Peace
Karen



On May 7, 2012, at 8:05 AM, Johan De Meersman wrote:

 - Original Message -
 
 From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
 Ok, Creating the index *after* the inserts, the index gets created in
 a single operation.
 But the indexes has to be updating row by row after the data rows has
 all been inserted. Does it work in this way?
 No, when you create an index on an existing table (like after a mass insert), 
 what happens is that the engine does a single full tablescan and builds the 
 index in a single pass, which is a lot more performant than updating a single 
 disk block for every record, for the simple reason that a single disk block 
 can contain dozens of index entries. 
 
 Imagine that you insert one million rows, and you have 100 index entries in a 
 disk block (random numbers, to make a point. Real numbers will depend on 
 storage, file system, index, et cetera). Obviously there's no way to write 
 less than a single block to disk - that's how it works. 
 
 You can update your index for each record in turn. That means you will need 
 to do 1 million index - and thus block - writes; plus additional reads for 
 those blocks you don't have in memory - that's the index cache. 
 
 Now, if you create a new index on an existing table, you are first of all 
 bypassing any index read operations - there *is* no index to read, yet. Then 
 the system is going to do a full tablescan - considered slow, but you need 
 all the data, so there's no better way anyway. The index will be built - 
 in-memory as much as possible - and the system will automatically prefer to 
 write only complete blocks - 10.000 of them. That's the exact same number of 
 index blocks, but you only write each block once, so that's only 10.000 
 writes instead of 1.000.000. 
 
 Now there's a lot more at play, things like B-tree balancing and whatnot, but 
 that's the basic picture. 
 
 -- 
 
 Bier met grenadyn 
 Is als mosterd by den wyn 
 Sy die't drinkt, is eene kwezel 
 Hy die't drinkt, is ras een ezel 


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: 回复： Why is creating indexes faster after inserting massive data rows?

2012-05-07 Thread Karen Abgarian

Good point about key buffer.   I was only thinking about the table updates for 
MyISAM, not indexes.   The being stuck waiting for buffer flush could also 
happen.  However, for the table blocks this would be the same issue as with 
load followed by index rebuild, and for the indexes, it will have to be 
compared, performance-wise, with an expense of sorting an equally sized index.  
 

On May 7, 2012, at 10:40 AM, Rick James wrote:

 (Correction to Karen's comments)
 * MyISAM does all its index operations in the key_buffer, similar to InnoDB 
 and its buffer_pool.
 * Yes, writes are delayed (in both engines), but not forever.  If the table 
 is huge, you will eventually be stuck waiting for blocks to be flushed from 
 cache.
 * If the table is small enough, all the I/O can be delayed, and done only 
 once.  So yes, the in-memory cache may be faster. 
 
 Based on this discussion, you should note that random indexes, such as 
 GUIDs, MD5s, etc, tend to 
 
 
 -Original Message-
 From: Karen Abgarian [mailto:a...@apple.com]
 Sent: Monday, May 07, 2012 10:31 AM
 To: mysql@lists.mysql.com
 Subject: Re: 回复： Why is creating indexes faster after inserting
 massive data rows?
 
 Hi,
 
 A couple cents to this.
 
 There isn't really a million of block writes.   The record gets added
 to the block, but that gets modified in OS cache if we assume MyISAM
 tables and in the Innodb buffer if we assume InnoDB tables.   In both
 cases, the actual writing does not take place and does not slow down
 the process.What does however happen for each operation, is
 processing the statement, locating the entries to update in the index,
 index block splits and , for good reason, committing.
 
 When it comes to creating an index, what needs to happen, is to read
 the whole table and to sort all rows by the index key.   The latter
 process will be the most determining factor in answering the original
 question, because for the large tables the sort will have to do a lot
 of disk I/O.The point I am trying to make is there will be
 situations when creating indexes and then inserting the rows will be
 faster than creating an index afterwards.   If we try to determine such
 situations, we could notice that the likelihood of the sort going to
 disk increases with the amount of distinct values to be sorted.   For
 this reason, my choice would be to create things like primary/unique
 keys beforehand unless I am certain that everything will fit in the
 available memory.
 
 Peace
 Karen
 
 
 
 On May 7, 2012, at 8:05 AM, Johan De Meersman wrote:
 
 - Original Message -
 
 From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
 Ok, Creating the index *after* the inserts, the index gets created
 in
 a single operation.
 But the indexes has to be updating row by row after the data rows
 has
 all been inserted. Does it work in this way?
 No, when you create an index on an existing table (like after a mass
 insert), what happens is that the engine does a single full tablescan
 and builds the index in a single pass, which is a lot more performant
 than updating a single disk block for every record, for the simple
 reason that a single disk block can contain dozens of index entries.
 
 Imagine that you insert one million rows, and you have 100 index
 entries in a disk block (random numbers, to make a point. Real numbers
 will depend on storage, file system, index, et cetera). Obviously
 there's no way to write less than a single block to disk - that's how
 it works.
 
 You can update your index for each record in turn. That means you
 will need to do 1 million index - and thus block - writes; plus
 additional reads for those blocks you don't have in memory - that's the
 index cache.
 
 Now, if you create a new index on an existing table, you are first of
 all bypassing any index read operations - there *is* no index to read,
 yet. Then the system is going to do a full tablescan - considered slow,
 but you need all the data, so there's no better way anyway. The index
 will be built - in-memory as much as possible - and the system will
 automatically prefer to write only complete blocks - 10.000 of them.
 That's the exact same number of index blocks, but you only write each
 block once, so that's only 10.000 writes instead of 1.000.000.
 
 Now there's a lot more at play, things like B-tree balancing and
 whatnot, but that's the basic picture.
 
 --
 
 Bier met grenadyn
 Is als mosterd by den wyn
 Sy die't drinkt, is eene kwezel
 Hy die't drinkt, is ras een ezel
 
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql
 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

回复：回复： Why is creating indexes faster after inserting massive data rows?

Karen...

The mysql does not use this approach what you said which is complicated.

I  agree with ohan De Meersman.



 发件人： Karen Abgarian a...@apple.com
收件人： mysql@lists.mysql.com 
发送日期： 2012年5月8日, 星期二, 上午 1:30
主题: Re: 回复： Why is creating indexes faster after inserting massive data rows?
 
Hi, 

A couple cents to this. 

There isn't really a million of block writes.   The record gets added to the 
block, but that gets modified in OS cache if we assume MyISAM tables and in the 
Innodb buffer if we assume InnoDB tables.   In both cases, the actual writing 
does not take place and does not slow down the process.    What does however 
happen for each operation, is processing the statement, locating the entries to 
update in the index, index block splits and , for good reason, committing.  

When it comes to creating an index, what needs to happen, is to read the whole 
table and to sort all rows by the index key.   The latter process will be the 
most determining factor in answering the original question, because for the 
large tables the sort will have to do a lot of disk I/O.    The point I am 
trying to make is there will be situations when creating indexes and then 
inserting the rows will be faster than creating an index afterwards.   If we 
try to determine such situations, we could notice that the likelihood of the 
sort going to disk increases with the amount of distinct values to be sorted.   
For this reason, my choice would be to create things like primary/unique keys 
beforehand unless I am certain that everything will fit in the available 
memory. 

Peace
Karen



On May 7, 2012, at 8:05 AM, Johan De Meersman wrote:

 - Original Message -
 
 From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
 Ok, Creating the index *after* the inserts, the index gets created in
 a single operation.
 But the indexes has to be updating row by row after the data rows has
 all been inserted. Does it work in this way?
 No, when you create an index on an existing table (like after a mass insert), 
 what happens is that the engine does a single full tablescan and builds the 
 index in a single pass, which is a lot more performant than updating a single 
 disk block for every record, for the simple reason that a single disk block 
 can contain dozens of index entries. 
 
 Imagine that you insert one million rows, and you have 100 index entries in a 
 disk block (random numbers, to make a point. Real numbers will depend on 
 storage, file system, index, et cetera). Obviously there's no way to write 
 less than a single block to disk - that's how it works. 
 
 You can update your index for each record in turn. That means you will need 
 to do 1 million index - and thus block - writes; plus additional reads for 
 those blocks you don't have in memory - that's the index cache. 
 
 Now, if you create a new index on an existing table, you are first of all 
 bypassing any index read operations - there *is* no index to read, yet. Then 
 the system is going to do a full tablescan - considered slow, but you need 
 all the data, so there's no better way anyway. The index will be built - 
 in-memory as much as possible - and the system will automatically prefer to 
 write only complete blocks - 10.000 of them. That's the exact same number of 
 index blocks, but you only write each block once, so that's only 10.000 
 writes instead of 1.000.000. 
 
 Now there's a lot more at play, things like B-tree balancing and whatnot, but 
 that's the basic picture. 
 
 -- 
 
 Bier met grenadyn 
 Is als mosterd by den wyn 
 Sy die't drinkt, is eene kwezel 
 Hy die't drinkt, is ras een ezel 


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Re: 回复：回复： Why is creating indexes faster after inserting massive data rows?

2012-05-07 Thread Karen Abgarian

Honestly, I did not understand that.   I did not say anything about being 
complicated.  What does mysql not use, caching??

Judging by experience, creating a unique index on say, a 200G table could be a 
bitter one.   


On 07.05.2012, at 19:26, Zhangzhigang wrote:

 Karen...
 
 The mysql does not use this approach what you said which is complicated.
 
 I  agree with ohan De Meersman.
 
 
 
 发件人： Karen Abgarian a...@apple.com
 收件人： mysql@lists.mysql.com 
 发送日期： 2012年5月8日, 星期二, 上午 1:30
 主题: Re: 回复： Why is creating indexes faster after inserting massive data rows?
 
 Hi, 
 
 A couple cents to this. 
 
 There isn't really a million of block writes.   The record gets added to the 
 block, but that gets modified in OS cache if we assume MyISAM tables and in 
 the Innodb buffer if we assume InnoDB tables.   In both cases, the actual 
 writing does not take place and does not slow down the process.What does 
 however happen for each operation, is processing the statement, locating the 
 entries to update in the index, index block splits and , for good reason, 
 committing.  
 
 When it comes to creating an index, what needs to happen, is to read the 
 whole table and to sort all rows by the index key.   The latter process will 
 be the most determining factor in answering the original question, because 
 for the large tables the sort will have to do a lot of disk I/O.The point 
 I am trying to make is there will be situations when creating indexes and 
 then inserting the rows will be faster than creating an index afterwards.   
 If we try to determine such situations, we could notice that the likelihood 
 of the sort going to disk increases with the amount of distinct values to be 
 sorted.   For this reason, my choice would be to create things like 
 primary/unique keys beforehand unless I am certain that everything will fit 
 in the available memory. 
 
 Peace
 Karen
 
 
 
 On May 7, 2012, at 8:05 AM, Johan De Meersman wrote:
 
 - Original Message -
 
 From: Zhangzhigang zzgang_2...@yahoo.com.cn
 
 Ok, Creating the index *after* the inserts, the index gets created in
 a single operation.
 But the indexes has to be updating row by row after the data rows has
 all been inserted. Does it work in this way?
 No, when you create an index on an existing table (like after a mass 
 insert), what happens is that the engine does a single full tablescan and 
 builds the index in a single pass, which is a lot more performant than 
 updating a single disk block for every record, for the simple reason that a 
 single disk block can contain dozens of index entries. 
 
 Imagine that you insert one million rows, and you have 100 index entries in 
 a disk block (random numbers, to make a point. Real numbers will depend on 
 storage, file system, index, et cetera). Obviously there's no way to write 
 less than a single block to disk - that's how it works. 
 
 You can update your index for each record in turn. That means you will need 
 to do 1 million index - and thus block - writes; plus additional reads for 
 those blocks you don't have in memory - that's the index cache. 
 
 Now, if you create a new index on an existing table, you are first of all 
 bypassing any index read operations - there *is* no index to read, yet. Then 
 the system is going to do a full tablescan - considered slow, but you need 
 all the data, so there's no better way anyway. The index will be built - 
 in-memory as much as possible - and the system will automatically prefer to 
 write only complete blocks - 10.000 of them. That's the exact same number of 
 index blocks, but you only write each block once, so that's only 10.000 
 writes instead of 1.000.000. 
 
 Now there's a lot more at play, things like B-tree balancing and whatnot, 
 but that's the basic picture. 
 
 -- 
 
 Bier met grenadyn 
 Is als mosterd by den wyn 
 Sy die't drinkt, is eene kwezel 
 Hy die't drinkt, is ras een ezel 
 
 
 -- 
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql

Re: MySQL Indexes

2011-10-07 Thread Tompkins Neil

Is it normal practice for a heavily queried MYSQL tables to have a index
file bigger than the data file ?

On Fri, Oct 7, 2011 at 12:22 AM, Michael Dykman mdyk...@gmail.com wrote:

 Only one index at a time can be used per query, so neither strategy is
 optimal.  You need at look at the queries you intend to run against the
 system and construct indexes which support them.

  - md

 On Thu, Oct 6, 2011 at 2:35 PM, Neil Tompkins 
 neil.tompk...@googlemail.com wrote:

 Maybe that was a bad example.  If the query was name = 'Red' what index
 should I create ?

 Should I create a index of all columns used in each query or have a index
 on individual column ?


 On 6 Oct 2011, at 17:28, Michael Dykman mdyk...@gmail.com wrote:

 For the first query, the obvious index on score will give you optimal
 results.

 The second query is founded on this phrase: Like '%Red%'  and no index
 will help you there.  This is an anti-pattern, I am afraid.  The only way
 your database can satisfy that expression is to test each and every record
 in the that database (the test itself being expensive as infix finding is
 iterative).  Perhaps you should consider this approach instead:
  http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
 http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html

 On Thu, Oct 6, 2011 at 10:59 AM, Tompkins Neil 
 neil.tompk...@googlemail.com
 neil.tompk...@googlemail.com wrote:

 Hi,

 Can anyone help and offer some advice with regards MySQL indexes.
  Basically
 we have a number of different tables all of which have the obviously
 primary
 keys.  We then have some queries using JOIN statements that run slowly
 than
 we wanted.  How many indexes are recommended per table ?  For example
 should
 I have a index on all fields that will be used in a WHERE statement ?
  Should the indexes be created with multiple fields ?  A example  of two
 basic queries

 SELECT auto_id, name, score
 FROM test_table
 WHERE score  10
 ORDER BY score DESC


 SELECT auto_id, name, score
 FROM test_table
 WHERE score  10
 AND name Like '%Red%'
 ORDER BY score DESC

 How many indexes should be created for these two queries ?

 Thanks,
 Neil




 --
  - michael dykman
  - mdyk...@gmail.commdyk...@gmail.com

  May the Source be with you.




 --
  - michael dykman
  - mdyk...@gmail.com

  May the Source be with you.

Re: MySQL Indexes

2011-10-07 Thread Brandon Phelps


This thread has sparked my interest. What is the difference between an index on 
(field_a, field_b) and an index on (field_b, field_a)?

On 10/06/2011 07:43 PM, Nuno Tavares wrote:

Neil, whenever you see multiple fields you'd like to index, you should
consider, at least:

* The frequency of each query;
* The occurrences of the same field in multiple queries;
* The cardinality of each field;

There is a tool Index Analyzer that may give you some hints, and I
think it's maatkit that has a tool to run a query log to find good
candidates - I've seen it somewhere, I believe

Just remember that idx_a(field_a,field_b) is not the same, and is not
considered for use, the same way as idx_b(field_b,field_a).

-NT


Em 07-10-2011 00:22, Michael Dykman escreveu:

Only one index at a time can be used per query, so neither strategy is
optimal.  You need at look at the queries you intend to run against the
system and construct indexes which support them.

  - md

On Thu, Oct 6, 2011 at 2:35 PM, Neil Tompkins
neil.tompk...@googlemail.comwrote:


Maybe that was a bad example.  If the query was name = 'Red' what index
should I create ?

Should I create a index of all columns used in each query or have a index
on individual column ?


On 6 Oct 2011, at 17:28, Michael Dykmanmdyk...@gmail.com  wrote:

For the first query, the obvious index on score will give you optimal
results.

The second query is founded on this phrase: Like '%Red%'  and no index
will help you there.  This is an anti-pattern, I am afraid.  The only way
your database can satisfy that expression is to test each and every record
in the that database (the test itself being expensive as infix finding is
iterative).  Perhaps you should consider this approach instead:
  http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html

On Thu, Oct 6, 2011 at 10:59 AM, Tompkins Neilneil.tompk...@googlemail.com
neil.tompk...@googlemail.com  wrote:


Hi,

Can anyone help and offer some advice with regards MySQL indexes.
  Basically
we have a number of different tables all of which have the obviously
primary
keys.  We then have some queries using JOIN statements that run slowly
than
we wanted.  How many indexes are recommended per table ?  For example
should
I have a index on all fields that will be used in a WHERE statement ?
  Should the indexes be created with multiple fields ?  A example  of two
basic queries

SELECT auto_id, name, score
FROM test_table
WHERE score  10
ORDER BY score DESC


SELECT auto_id, name, score
FROM test_table
WHERE score  10
AND name Like '%Red%'
ORDER BY score DESC

How many indexes should be created for these two queries ?

Thanks,
Neil





--
  - michael dykman
  -mdyk...@gmail.commdyk...@gmail.com

  May the Source be with you.










--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: MySQL Indexes

When a query selects on field_a and field_b, that index can be used. If
querying on field_a alone, the index again is useful. Query on field_b
alone however, that first index is of no use to you.

On Fri, Oct 7, 2011 at 10:49 AM, Brandon Phelps bphe...@gls.com wrote:

This thread has sparked my interest. What is the difference between an
index on (field_a, field_b) and an index on (field_b, field_a)?

On 10/06/2011 07:43 PM, Nuno Tavares wrote:

Neil, whenever you see multiple fields you'd like to index, you should
consider, at least:

* The frequency of each query;
* The occurrences of the same field in multiple queries;
* The cardinality of each field;

There is a tool Index Analyzer that may give you some hints, and I
think it's maatkit that has a tool to run a query log to find good
candidates - I've seen it somewhere, I believe

Just remember that idx_a(field_a,field_b) is not the same, and is not
considered for use, the same way as idx_b(field_b,field_a).

-NT

Em 07-10-2011 00:22, Michael Dykman escreveu:

Only one index at a time can be used per query, so neither strategy is
optimal. You need at look at the queries you intend to run against the
system and construct indexes which support them.

- md

On Thu, Oct 6, 2011 at 2:35 PM, Neil Tompkins
neil.tompk...@googlemail.com**wrote:

Maybe that was a bad example. If the query was name = 'Red' what index
should I create ?

Should I create a index of all columns used in each query or have a
index
on individual column ?

On 6 Oct 2011, at 17:28, Michael Dykmanmdyk...@gmail.com wrote:

For the first query, the obvious index on score will give you optimal
results.

The second query is founded on this phrase: Like '%Red%' and no index
will help you there. This is an anti-pattern, I am afraid. The only
way
your database can satisfy that expression is to test each and every
record
in the that database (the test itself being expensive as infix finding
is
iterative). Perhaps you should consider this approach instead:
http://dev.mysql.com/doc/**refman/5.5/en/fulltext-**
natural-language.htmlhttp://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html

http://dev.mysql.com/doc/**refman/5.5/en/fulltext-**
natural-language.htmlhttp://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html

On Thu, Oct 6, 2011 at 10:59 AM, Tompkins Neilneil.tompkins@**
googlemail.com neil.tompk...@googlemail.com
neil.tompk...@googlemail.com wrote:

Hi,

Can anyone help and offer some advice with regards MySQL indexes.
Basically
we have a number of different tables all of which have the obviously
primary
keys. We then have some queries using JOIN statements that run slowly
than
we wanted. How many indexes are recommended per table ? For example
should
I have a index on all fields that will be used in a WHERE statement ?
Should the indexes be created with multiple fields ? A example of
two
basic queries

SELECT auto_id, name, score
FROM test_table
WHERE score 10
ORDER BY score DESC

SELECT auto_id, name, score
FROM test_table
WHERE score 10
AND name Like '%Red%'
ORDER BY score DESC

How many indexes should be created for these two queries ?

Thanks,
Neil

--
- michael dykman
-mdyk...@gmail.commdykman@**gmail.com mdyk...@gmail.com

May the Source be with you.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.comhttp://lists.mysql.com/mysql?unsub=mdyk...@gmail.com

--
- michael dykman
- mdyk...@gmail.com

May the Source be with you.

Re: MySQL Indexes

How heavily a given table is queried does not directly affect the index
size, only the number and depth of the indexes.

No, it is not that unusual to have the index file bigger.  Just make sure
that every index you have is justified by the queries you are making against
the table.

 - md


On Fri, Oct 7, 2011 at 4:26 AM, Tompkins Neil
neil.tompk...@googlemail.comwrote:

 Is it normal practice for a heavily queried MYSQL tables to have a index
 file bigger than the data file ?


 On Fri, Oct 7, 2011 at 12:22 AM, Michael Dykman mdyk...@gmail.com wrote:

 Only one index at a time can be used per query, so neither strategy is
 optimal.  You need at look at the queries you intend to run against the
 system and construct indexes which support them.

  - md

 On Thu, Oct 6, 2011 at 2:35 PM, Neil Tompkins 
 neil.tompk...@googlemail.com wrote:

 Maybe that was a bad example.  If the query was name = 'Red' what index
 should I create ?

 Should I create a index of all columns used in each query or have a index
 on individual column ?


 On 6 Oct 2011, at 17:28, Michael Dykman mdyk...@gmail.com wrote:

 For the first query, the obvious index on score will give you optimal
 results.

 The second query is founded on this phrase: Like '%Red%'  and no index
 will help you there.  This is an anti-pattern, I am afraid.  The only way
 your database can satisfy that expression is to test each and every record
 in the that database (the test itself being expensive as infix finding is
 iterative).  Perhaps you should consider this approach instead:
  http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
 http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html

 On Thu, Oct 6, 2011 at 10:59 AM, Tompkins Neil 
 neil.tompk...@googlemail.com
 neil.tompk...@googlemail.com wrote:

 Hi,

 Can anyone help and offer some advice with regards MySQL indexes.
  Basically
 we have a number of different tables all of which have the obviously
 primary
 keys.  We then have some queries using JOIN statements that run slowly
 than
 we wanted.  How many indexes are recommended per table ?  For example
 should
 I have a index on all fields that will be used in a WHERE statement ?
  Should the indexes be created with multiple fields ?  A example  of two
 basic queries

 SELECT auto_id, name, score
 FROM test_table
 WHERE score  10
 ORDER BY score DESC


 SELECT auto_id, name, score
 FROM test_table
 WHERE score  10
 AND name Like '%Red%'
 ORDER BY score DESC

 How many indexes should be created for these two queries ?

 Thanks,
 Neil




 --
  - michael dykman
  - mdyk...@gmail.commdyk...@gmail.com

  May the Source be with you.




 --
  - michael dykman
  - mdyk...@gmail.com

  May the Source be with you.





-- 
 - michael dykman
 - mdyk...@gmail.com

 May the Source be with you.

Re: MySQL Indexes

2011-10-07 Thread Reindl Harald

but could this not be called a bug?

Am 07.10.2011 18:08, schrieb Michael Dykman:
 When a query selects on field_a and field_b, that index can be used.  If
 querying on field_a alone, the index again is useful.  Query on field_b
 alone however, that first index is of no use to you.
 
 On Fri, Oct 7, 2011 at 10:49 AM, Brandon Phelps bphe...@gls.com wrote:
 
 This thread has sparked my interest. What is the difference between an
 index on (field_a, field_b) and an index on (field_b, field_a)?


 On 10/06/2011 07:43 PM, Nuno Tavares wrote:

 Neil, whenever you see multiple fields you'd like to index, you should
 consider, at least:

 * The frequency of each query;
 * The occurrences of the same field in multiple queries;
 * The cardinality of each field;

 There is a tool Index Analyzer that may give you some hints, and I
 think it's maatkit that has a tool to run a query log to find good
 candidates - I've seen it somewhere, I believe

 Just remember that idx_a(field_a,field_b) is not the same, and is not
 considered for use, the same way as idx_b(field_b,field_a).

 -NT


 Em 07-10-2011 00:22, Michael Dykman escreveu:

 Only one index at a time can be used per query, so neither strategy is
 optimal.  You need at look at the queries you intend to run against the
 system and construct indexes which support them.

  - md

 On Thu, Oct 6, 2011 at 2:35 PM, Neil Tompkins
 neil.tompk...@googlemail.com**wrote:

  Maybe that was a bad example.  If the query was name = 'Red' what index
 should I create ?

 Should I create a index of all columns used in each query or have a
 index
 on individual column ?


 On 6 Oct 2011, at 17:28, Michael Dykmanmdyk...@gmail.com  wrote:

 For the first query, the obvious index on score will give you optimal
 results.

 The second query is founded on this phrase: Like '%Red%'  and no index
 will help you there.  This is an anti-pattern, I am afraid.  The only
 way
 your database can satisfy that expression is to test each and every
 record
 in the that database (the test itself being expensive as infix finding
 is
 iterative).  Perhaps you should consider this approach instead:
  http://dev.mysql.com/doc/**refman/5.5/en/fulltext-**
 natural-language.htmlhttp://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html

 http://dev.mysql.com/doc/**refman/5.5/en/fulltext-**
 natural-language.htmlhttp://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html

 On Thu, Oct 6, 2011 at 10:59 AM, Tompkins Neilneil.tompkins@**
 googlemail.com neil.tompk...@googlemail.com
 neil.tompk...@googlemail.com  wrote:

  Hi,

 Can anyone help and offer some advice with regards MySQL indexes.
  Basically
 we have a number of different tables all of which have the obviously
 primary
 keys.  We then have some queries using JOIN statements that run slowly
 than
 we wanted.  How many indexes are recommended per table ?  For example
 should
 I have a index on all fields that will be used in a WHERE statement ?
  Should the indexes be created with multiple fields ?  A example  of
 two
 basic queries

 SELECT auto_id, name, score
 FROM test_table
 WHERE score  10
 ORDER BY score DESC


 SELECT auto_id, name, score
 FROM test_table
 WHERE score  10
 AND name Like '%Red%'
 ORDER BY score DESC

 How many indexes should be created for these two queries ?

 Thanks,
 Neil




 --
  - michael dykman
  -mdyk...@gmail.commdykman@**gmail.com mdyk...@gmail.com

  May the Source be with you.







 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.comhttp://lists.mysql.com/mysql?unsub=mdyk...@gmail.com


 
 

-- 

Mit besten Grüßen, Reindl Harald
the lounge interactive design GmbH
A-1060 Vienna, Hofmühlgasse 17
CTO / software-development / cms-solutions
p: +43 (1) 595 3999 33, m: +43 (676) 40 221 40
icq: 154546673, http://www.thelounge.net/

http://www.thelounge.net/signature.asc.what.htm



signature.asc
Description: OpenPGP digital signature

Re: MySQL Indexes

No, I don't think it can be called.  It is a direct consequence of the
relational paradigm.  Any implementation of an RDBMS has the same
characteristic.

 - md

On Fri, Oct 7, 2011 at 12:20 PM, Reindl Harald h.rei...@thelounge.netwrote:

 but could this not be called a bug?

 Am 07.10.2011 18:08, schrieb Michael Dykman:
  When a query selects on field_a and field_b, that index can be used.  If
  querying on field_a alone, the index again is useful.  Query on field_b
  alone however, that first index is of no use to you.
 
  On Fri, Oct 7, 2011 at 10:49 AM, Brandon Phelps bphe...@gls.com wrote:
 
  This thread has sparked my interest. What is the difference between an
  index on (field_a, field_b) and an index on (field_b, field_a)?
 
 
  On 10/06/2011 07:43 PM, Nuno Tavares wrote:
 
  Neil, whenever you see multiple fields you'd like to index, you should
  consider, at least:
 
  * The frequency of each query;
  * The occurrences of the same field in multiple queries;
  * The cardinality of each field;
 
  There is a tool Index Analyzer that may give you some hints, and I
  think it's maatkit that has a tool to run a query log to find good
  candidates - I've seen it somewhere, I believe
 
  Just remember that idx_a(field_a,field_b) is not the same, and is not
  considered for use, the same way as idx_b(field_b,field_a).
 
  -NT
 
 
  Em 07-10-2011 00:22, Michael Dykman escreveu:
 
  Only one index at a time can be used per query, so neither strategy is
  optimal.  You need at look at the queries you intend to run against
 the
  system and construct indexes which support them.
 
   - md
 
  On Thu, Oct 6, 2011 at 2:35 PM, Neil Tompkins
  neil.tompk...@googlemail.com**wrote:
 
   Maybe that was a bad example.  If the query was name = 'Red' what
 index
  should I create ?
 
  Should I create a index of all columns used in each query or have a
  index
  on individual column ?
 
 
  On 6 Oct 2011, at 17:28, Michael Dykmanmdyk...@gmail.com  wrote:
 
  For the first query, the obvious index on score will give you optimal
  results.
 
  The second query is founded on this phrase: Like '%Red%'  and no
 index
  will help you there.  This is an anti-pattern, I am afraid.  The only
  way
  your database can satisfy that expression is to test each and every
  record
  in the that database (the test itself being expensive as infix
 finding
  is
  iterative).  Perhaps you should consider this approach instead:
   http://dev.mysql.com/doc/**refman/5.5/en/fulltext-**
  natural-language.html
 http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
 
  http://dev.mysql.com/doc/**refman/5.5/en/fulltext-**
  natural-language.html
 http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
 
  On Thu, Oct 6, 2011 at 10:59 AM, Tompkins Neilneil.tompkins@**
  googlemail.com neil.tompk...@googlemail.com
  neil.tompk...@googlemail.com  wrote:
 
   Hi,
 
  Can anyone help and offer some advice with regards MySQL indexes.
   Basically
  we have a number of different tables all of which have the obviously
  primary
  keys.  We then have some queries using JOIN statements that run
 slowly
  than
  we wanted.  How many indexes are recommended per table ?  For
 example
  should
  I have a index on all fields that will be used in a WHERE statement
 ?
   Should the indexes be created with multiple fields ?  A example  of
  two
  basic queries
 
  SELECT auto_id, name, score
  FROM test_table
  WHERE score  10
  ORDER BY score DESC
 
 
  SELECT auto_id, name, score
  FROM test_table
  WHERE score  10
  AND name Like '%Red%'
  ORDER BY score DESC
 
  How many indexes should be created for these two queries ?
 
  Thanks,
  Neil
 
 
 
 
  --
   - michael dykman
   -mdyk...@gmail.commdykman@**gmail.com mdyk...@gmail.com
 
   May the Source be with you.
 
 
 
 
 
 
 
  --
  MySQL General Mailing List
  For list archives: http://lists.mysql.com/mysql
  To unsubscribe:
 http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.com
 http://lists.mysql.com/mysql?unsub=mdyk...@gmail.com
 
 
 
 

 --

 Mit besten Grüßen, Reindl Harald
 the lounge interactive design GmbH
 A-1060 Vienna, Hofmühlgasse 17
 CTO / software-development / cms-solutions
 p: +43 (1) 595 3999 33, m: +43 (676) 40 221 40
 icq: 154546673, http://www.thelounge.net/

 http://www.thelounge.net/signature.asc.what.htm




-- 
 - michael dykman
 - mdyk...@gmail.com

 May the Source be with you.

Re: MySQL Indexes

2011-10-07 Thread Neil Tompkins

Can you give more information as to why the second index would be of no use ?  

On 7 Oct 2011, at 18:24, Michael Dykman mdyk...@gmail.com wrote:

 No, I don't think it can be called.  It is a direct consequence of the
 relational paradigm.  Any implementation of an RDBMS has the same
 characteristic.
 
 - md
 
 On Fri, Oct 7, 2011 at 12:20 PM, Reindl Harald h.rei...@thelounge.netwrote:
 
 but could this not be called a bug?
 
 Am 07.10.2011 18:08, schrieb Michael Dykman:
 When a query selects on field_a and field_b, that index can be used.  If
 querying on field_a alone, the index again is useful.  Query on field_b
 alone however, that first index is of no use to you.
 
 On Fri, Oct 7, 2011 at 10:49 AM, Brandon Phelps bphe...@gls.com wrote:
 
 This thread has sparked my interest. What is the difference between an
 index on (field_a, field_b) and an index on (field_b, field_a)?
 
 
 On 10/06/2011 07:43 PM, Nuno Tavares wrote:
 
 Neil, whenever you see multiple fields you'd like to index, you should
 consider, at least:
 
 * The frequency of each query;
 * The occurrences of the same field in multiple queries;
 * The cardinality of each field;
 
 There is a tool Index Analyzer that may give you some hints, and I
 think it's maatkit that has a tool to run a query log to find good
 candidates - I've seen it somewhere, I believe
 
 Just remember that idx_a(field_a,field_b) is not the same, and is not
 considered for use, the same way as idx_b(field_b,field_a).
 
 -NT
 
 
 Em 07-10-2011 00:22, Michael Dykman escreveu:
 
 Only one index at a time can be used per query, so neither strategy is
 optimal.  You need at look at the queries you intend to run against
 the
 system and construct indexes which support them.
 
 - md
 
 On Thu, Oct 6, 2011 at 2:35 PM, Neil Tompkins
 neil.tompk...@googlemail.com**wrote:
 
 Maybe that was a bad example.  If the query was name = 'Red' what
 index
 should I create ?
 
 Should I create a index of all columns used in each query or have a
 index
 on individual column ?
 
 
 On 6 Oct 2011, at 17:28, Michael Dykmanmdyk...@gmail.com  wrote:
 
 For the first query, the obvious index on score will give you optimal
 results.
 
 The second query is founded on this phrase: Like '%Red%'  and no
 index
 will help you there.  This is an anti-pattern, I am afraid.  The only
 way
 your database can satisfy that expression is to test each and every
 record
 in the that database (the test itself being expensive as infix
 finding
 is
 iterative).  Perhaps you should consider this approach instead:
 http://dev.mysql.com/doc/**refman/5.5/en/fulltext-**
 natural-language.html
 http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
 
 http://dev.mysql.com/doc/**refman/5.5/en/fulltext-**
 natural-language.html
 http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
 
 On Thu, Oct 6, 2011 at 10:59 AM, Tompkins Neilneil.tompkins@**
 googlemail.com neil.tompk...@googlemail.com
 neil.tompk...@googlemail.com  wrote:
 
 Hi,
 
 Can anyone help and offer some advice with regards MySQL indexes.
 Basically
 we have a number of different tables all of which have the obviously
 primary
 keys.  We then have some queries using JOIN statements that run
 slowly
 than
 we wanted.  How many indexes are recommended per table ?  For
 example
 should
 I have a index on all fields that will be used in a WHERE statement
 ?
 Should the indexes be created with multiple fields ?  A example  of
 two
 basic queries
 
 SELECT auto_id, name, score
 FROM test_table
 WHERE score  10
 ORDER BY score DESC
 
 
 SELECT auto_id, name, score
 FROM test_table
 WHERE score  10
 AND name Like '%Red%'
 ORDER BY score DESC
 
 How many indexes should be created for these two queries ?
 
 Thanks,
 Neil
 
 
 
 
 --
 - michael dykman
 -mdyk...@gmail.commdykman@**gmail.com mdyk...@gmail.com
 
 May the Source be with you.
 
 
 
 
 
 
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:
 http://lists.mysql.com/mysql?**unsub=mdyk...@gmail.com
 http://lists.mysql.com/mysql?unsub=mdyk...@gmail.com
 
 
 
 
 
 --
 
 Mit besten Grüßen, Reindl Harald
 the lounge interactive design GmbH
 A-1060 Vienna, Hofmühlgasse 17
 CTO / software-development / cms-solutions
 p: +43 (1) 595 3999 33, m: +43 (676) 40 221 40
 icq: 154546673, http://www.thelounge.net/
 
 http://www.thelounge.net/signature.asc.what.htm
 
 
 
 
 -- 
 - michael dykman
 - mdyk...@gmail.com
 
 May the Source be with you.

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org

Re: MySQL Indexes

2011-10-07 Thread Neil Tompkins

Do you have any good documentation with regards creating indexes. Also 
information for explain statement and what would be the desired result of the 
explain statement?

On 7 Oct 2011, at 17:10, Michael Dykman mdyk...@gmail.com wrote:

 How heavily a given table is queried does not directly affect the index size, 
 only the number and depth of the indexes.
 
 No, it is not that unusual to have the index file bigger.  Just make sure 
 that every index you have is justified by the queries you are making against 
 the table.
 
  - md
 
 
 On Fri, Oct 7, 2011 at 4:26 AM, Tompkins Neil neil.tompk...@googlemail.com 
 wrote:
 Is it normal practice for a heavily queried MYSQL tables to have a index file 
 bigger than the data file ?
 
 
 On Fri, Oct 7, 2011 at 12:22 AM, Michael Dykman mdyk...@gmail.com wrote:
 Only one index at a time can be used per query, so neither strategy is 
 optimal.  You need at look at the queries you intend to run against the 
 system and construct indexes which support them.
 
  - md
 
 On Thu, Oct 6, 2011 at 2:35 PM, Neil Tompkins neil.tompk...@googlemail.com 
 wrote:
 Maybe that was a bad example.  If the query was name = 'Red' what index 
 should I create ?
 
 Should I create a index of all columns used in each query or have a index on 
 individual column ?
 
 
 On 6 Oct 2011, at 17:28, Michael Dykman mdyk...@gmail.com wrote:
 
 For the first query, the obvious index on score will give you optimal 
 results.
 
 The second query is founded on this phrase: Like '%Red%'  and no index 
 will help you there.  This is an anti-pattern, I am afraid.  The only way 
 your database can satisfy that expression is to test each and every record 
 in the that database (the test itself being expensive as infix finding is 
 iterative).  Perhaps you should consider this approach instead:
 http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
 
 On Thu, Oct 6, 2011 at 10:59 AM, Tompkins Neil 
 neil.tompk...@googlemail.com wrote:
 Hi,
 
 Can anyone help and offer some advice with regards MySQL indexes.  Basically
 we have a number of different tables all of which have the obviously primary
 keys.  We then have some queries using JOIN statements that run slowly than
 we wanted.  How many indexes are recommended per table ?  For example should
 I have a index on all fields that will be used in a WHERE statement ?
  Should the indexes be created with multiple fields ?  A example  of two
 basic queries
 
 SELECT auto_id, name, score
 FROM test_table
 WHERE score  10
 ORDER BY score DESC
 
 
 SELECT auto_id, name, score
 FROM test_table
 WHERE score  10
 AND name Like '%Red%'
 ORDER BY score DESC
 
 How many indexes should be created for these two queries ?
 
 Thanks,
 Neil
 
 
 
 -- 
  - michael dykman
  - mdyk...@gmail.com
 
  May the Source be with you.
 
 
 
 -- 
  - michael dykman
  - mdyk...@gmail.com
 
  May the Source be with you.
 
 
 
 
 -- 
  - michael dykman
  - mdyk...@gmail.com
 
  May the Source be with you.

Re: MySQL Indexes