Re: Why is creating indexes faster after inserting massive data rows?
which version of mysql are you using. Is this secondary index.? On Mon, May 7, 2012 at 12:07 PM, Zhangzhigang zzgang_2...@yahoo.com.cnwrote: hi all: I have a question: Creating indexes after inserting massive data rows is faster than before inserting data rows. Please tell me why.
Re: Myisam won't support replication in an MySQL Cluster environment
- Original Message - From: Charles Brown cbr...@bmi.com Interestingly, over the years, I've been reading your postings and threads - without a doubt you're a major contributor. You've been very resourceful and helpful to your peers. We may never know what caused you to violently snap this time. However, I would encourage Interestingly, if you *had* been following so closely over the years, you'd know both that this isn't violently snapping at all - I've seen him explode much worse; *and* that you should both provide relevant data as well as actually bother to read people's answers if you expect help. you to continue to be nice and respectful to others -- particularly others you don't know. While this forum provides an excellent opportunity for us to exchange and share our experiences in MySQL, yet we expect everyone to conduct themselves politely and restrain from ideological overtures. The thought that an intelligent individual like you would bring himself this low flies in the face of all rational behavior. That's a whole lot of quite good managementspeak - and just as meaningless. I've seen your threads over the past couple of weeks, and have come to the same conclusion as Harald: you keep reposting the same inane question, all the while blatantly ignoring any and all relevant and informative replies you get from knowledgeable and experienced DBAs, highly specialised consultants and people from the actualy MySQL support alike. The main difference between his and my reaction to the kind of behaviour you are showing is that he get annoyed, whereas I simply ignore the thread. The answer to your question has been posted repeatedly. There is also perfectly good documentation available, yet you choose to ignore both and keep reposting the same thing over and over again, giving no indication of any form of comprehension whatsoever. Go read the documentation, go search the internet, and if you *still* have issues, come back here and ask intelligent questions instead of things you find in the first few pages of any introductory paper. This list consists of volunteers who freely provide their expertise, spending their time and asking nothing in return. It is NOT your personal helpdesk. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
回复: Why is creating indexes faster after inserting massive data rows?
Version : Mysql 5.1 Engine : MyISAM. The indexes are normal but neither primary key or unique key. I should describe mysql question clearly. When inserting massive data rows to table which need to be created indexes, i can create indexes before inserting data rows, anther way is that i can insert all data rows firstly and then create indexes. Normally, the sum using time(inserting data rows and creating indexes) of first way is longer than the second way. Please tell me why? 发件人: Ananda Kumar anan...@gmail.com 收件人: Zhangzhigang zzgang_2...@yahoo.com.cn 抄送: mysql@lists.mysql.com mysql@lists.mysql.com 发送日期: 2012年5月7日, 星期一, 下午 3:31 主题: Re: Why is creating indexes faster after inserting massive data rows? which version of mysql 燼re you using. Is this secondary index.? On Mon, May 7, 2012 at 12:07 PM, Zhangzhigang zzgang_2...@yahoo.com.cn wrote: hi all: I have a question: Creating indexes after inserting massive data rows is faster than before inserting data rows. Please tell me why.
Re: Why is creating indexes faster after inserting massive data rows?
- Original Message - From: Zhangzhigang zzgang_2...@yahoo.com.cn Creating indexes after inserting massive data rows is faster than before inserting data rows. Please tell me why. Plain and simple: the indices get updated after every insert statement, whereas if you only create the index *after* the inserts, the index gets created in a single operation, which is a lot more efficient. I seem to recall that inside of a transaction (thus, InnoDB or so) the difference is markedly less; I might be wrong, though. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
回复: Why is creating indexes faster after inserting massive data rows?
johan Plain and simple: the indices get updated after every insert statement, whereas if you only create the index *after* the inserts, the index gets created in a single operation, which is a lot more efficient.. Ok, Creating the index *after* the inserts, the index gets created in a single operation. But the indexes has to be updating row by row after the data rows has all been inserted. Does it work in this way? So i can not find the different overhead about two ways. 发件人: Johan De Meersman vegiv...@tuxera.be 收件人: Zhangzhigang zzgang_2...@yahoo.com.cn 抄送: mysql@lists.mysql.com 发送日期: 2012年5月7日, 星期一, 下午 4:28 主题: Re: Why is creating indexes faster after inserting massive data rows? - Original Message - From: Zhangzhigang zzgang_2...@yahoo.com.cn Creating indexes after inserting massive data rows is faster than before inserting data rows. Please tell me why. Plain and simple: the indices get updated after every insert statement, whereas if you only create the index *after* the inserts, the index gets created in a single operation, which is a lot more efficient. I seem to recall that inside of a transaction (thus, InnoDB or so) the difference is markedly less; I might be wrong, though. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel
Re: 回复: Why is creating indexes faster after inserting massive data rows?
On 2012/05/07 10:53, Zhangzhigang wrote: johan Plain and simple: the indices get updated after every insert statement, whereas if you only create the index *after* the inserts, the index gets created in a single operation, which is a lot more efficient.. Ok, Creating the index *after* the inserts, the index gets created in a single operation. But the indexes has to be updating row by row after the data rows has all been inserted. Does it work in this way? So i can not find the different overhead about two ways. My simplified 2c. When inserting rows with active indexes one by one (insert), mysql has to 1) lock the space for the data to be added, 2) write the data, 3) lock the index, 4) write the index key(s), 5) unlock the index, 6)unlock the data This happens for each row When first doing all data without index, only 1, 2, and 6 happen. When you then create an index, it can lock the index, read all the data and write all index keys in one go and then unlock the index. If you make an omelet, do you fetch your eggs from the fridge one by one, or all at the same time? :) HTH, Alex -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: 回复: Why is creating indexes faster after inserting massive data rows?
Creating the index in one time is one macro-sort operation, updating the index at every row is doing the operation on and on again. If you do not understand the difference I recommend you to read some basics about sorting algorithms, very interesting read anyway. Claudio 2012/5/7 Zhangzhigang zzgang_2...@yahoo.com.cn johan Plain and simple: the indices get updated after every insert statement, whereas if you only create the index *after* the inserts, the index gets created in a single operation, which is a lot more efficient.. Ok, Creating the index *after* the inserts, the index gets created in a single operation. But the indexes has to be updating row by row after the data rows has all been inserted. Does it work in this way? So i can not find the different overhead about two ways. 发件人: Johan De Meersman vegiv...@tuxera.be 收件人: Zhangzhigang zzgang_2...@yahoo.com.cn 抄送: mysql@lists.mysql.com 发送日期: 2012年5月7日, 星期一, 下午 4:28 主题: Re: Why is creating indexes faster after inserting massive data rows? - Original Message - From: Zhangzhigang zzgang_2...@yahoo.com.cn Creating indexes after inserting massive data rows is faster than before inserting data rows. Please tell me why. Plain and simple: the indices get updated after every insert statement, whereas if you only create the index *after* the inserts, the index gets created in a single operation, which is a lot more efficient. I seem to recall that inside of a transaction (thus, InnoDB or so) the difference is markedly less; I might be wrong, though. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- Claudio
回复: 回复: Why is creating indexes faster after inserting massive data rows?
Ok, but my opinion is that the sorting algorithms is not impact this difference, two ways all do B+ tree inserts. 发件人: Claudio Nanni claudio.na...@gmail.com 收件人: Zhangzhigang zzgang_2...@yahoo.com.cn 抄送: Johan De Meersman vegiv...@tuxera.be; mysql@lists.mysql.com mysql@lists.mysql.com 发送日期: 2012年5月7日, 星期一, 下午 5:01 主题: Re: 回复: Why is creating indexes faster after inserting massive data rows? Creating the index in one time is one macro-sort operation, updating the index at every row is doing the operation on and on again. If you do not understand the difference I recommend you to read some basics about sorting algorithms, very interesting read anyway. Claudio 2012/5/7 Zhangzhigang zzgang_2...@yahoo.com.cn johan Plain and simple: the indices get updated after every insert statement, whereas if you only create the index *after* the inserts, the index gets created in a single operation, which is a lot more efficient.. Ok, Creating the index *after* the inserts, the index gets created in a single operation. But the indexes has to be updating row by row after the data rows has all been inserted. Does it work in this way? So i can not find the different overhead about two ways. 发件人: Johan De Meersman vegiv...@tuxera.be 收件人: Zhangzhigang zzgang_2...@yahoo.com.cn 抄送: mysql@lists.mysql.com 发送日期: 2012年5月7日, 星期一, 下午 4:28 主题: Re: Why is creating indexes faster after inserting massive data rows? - Original Message - From: Zhangzhigang zzgang_2...@yahoo.com.cn Creating indexes after inserting massive data rows is faster than before inserting data rows. Please tell me why. Plain and simple: the indices get updated after every insert statement, whereas if you only create the index *after* the inserts, the index gets created in a single operation, which is a lot more efficient. I seem to recall that inside of a transaction (thus, InnoDB or so) the difference is markedly less; I might be wrong, though. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- Claudio
回复: 回复: Why is creating indexes faster after inserting massive data rows?
Thanks, i thought about this answer in the past, and i appreciate your reply. 发件人: Alex Schaft al...@quicksoftware.co.za 收件人: mysql@lists.mysql.com 发送日期: 2012年5月7日, 星期一, 下午 4:59 主题: Re: 回复: Why is creating indexes faster after inserting massive data rows? On 2012/05/07 10:53, Zhangzhigang wrote: johan Plain and simple: the indices get updated after every insert statement, whereas if you only create the index *after* the inserts, the index gets created in a single operation, which is a lot more efficient.. Ok, Creating the index *after* the inserts, the index gets created in a single operation. But the indexes has to be updating row by row after the data rows has all been inserted. Does it work in this way? So i can not find the different overhead about two ways. My simplified 2c. When inserting rows with active indexes one by one (insert), mysql has to 1) lock the space for the data to be added, 2) write the data, 3) lock the index, 4) write the index key(s), 5) unlock the index, 6)unlock the data This happens for each row When first doing all data without index, only 1, 2, and 6 happen. When you then create an index, it can lock the index, read all the data and write all index keys in one go and then unlock the index. If you make an omelet, do you fetch your eggs from the fridge one by one, or all at the same time? :) HTH, Alex -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql
Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?
too nice not to share it! http://www.youtube.com/watch?v=INHF_5RIxTE 2012/5/7 Zhangzhigang zzgang_2...@yahoo.com.cn Thanks, i thought about this answer in the past, and i appreciate your reply. 发件人: Alex Schaft al...@quicksoftware.co.za 收件人: mysql@lists.mysql.com 发送日期: 2012年5月7日, 星期一, 下午 4:59 主题: Re: 回复: Why is creating indexes faster after inserting massive data rows? On 2012/05/07 10:53, Zhangzhigang wrote: johan Plain and simple: the indices get updated after every insert statement, whereas if you only create the index *after* the inserts, the index gets created in a single operation, which is a lot more efficient.. Ok, Creating the index *after* the inserts, the index gets created in a single operation. But the indexes has to be updating row by row after the data rows has all been inserted. Does it work in this way? So i can not find the different overhead about two ways. My simplified 2c. When inserting rows with active indexes one by one (insert), mysql has to 1) lock the space for the data to be added, 2) write the data, 3) lock the index, 4) write the index key(s), 5) unlock the index, 6)unlock the data This happens for each row When first doing all data without index, only 1, 2, and 6 happen. When you then create an index, it can lock the index, read all the data and write all index keys in one go and then unlock the index. If you make an omelet, do you fetch your eggs from the fridge one by one, or all at the same time? :) HTH, Alex -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- Claudio
MySQL Community Server 5.1.63 has been released
Dear MySQL users, MySQL Server 5.1.63, a new version of the popular Open Source Database Management System, has been released. MySQL 5.1.63 is recommended for use on production systems. For an overview of what's new in MySQL 5.1, please see http://dev.mysql.com/doc/refman/5.1/en/mysql-nutshell.html For information on installing MySQL 5.1.63 on new servers or upgrading to MySQL 5.1.63 from previous MySQL releases, please see http://dev.mysql.com/doc/refman/5.1/en/installing.html MySQL Server is available in source and binary form for a number of platforms from our download pages at http://dev.mysql.com/downloads/ Not all mirror sites may be up to date at this point in time, so if you can't find this version on some mirror, please try again later or choose another download site. We welcome and appreciate your feedback, bug reports, bug fixes, patches, etc: http://forge.mysql.com/wiki/Contributing For information on open issues in MySQL 5.1, please see the errata list at http://dev.mysql.com/doc/refman/5.1/en/bugs.html The following section lists the changes in the MySQL source code since the previous released version of MySQL 5.1. It may also be viewed online at http://dev.mysql.com/doc/refman/5.1/en/news-5-1-63.html Enjoy! === D.1.1. Changes in MySQL 5.1.63 (7th May, 2012) Bugs Fixed * Security Fix: Bug #64884 was fixed. * Security Fix: Bug #59387 was fixed. * InnoDB: Deleting a huge amount of data from InnoDB tables within a short time could cause the purge operation that flushes data from the buffer pool to stall. If this issue occurs, restart the server to work around it. This issue is only likely to occur on 32-bit platforms. (Bug #13847885) * InnoDB: If the server crashed during a TRUNCATE TABLE or CREATE INDEX statement for an InnoDB table, or a DROP DATABASE statement for a database containing InnoDB tables, an index could be corrupted, causing an error message when accessing the table after restart: InnoDB: Error: trying to load index index_name for table table_name InnoDB: but the index tree has been freed! In MySQL 5.1, this fix applies to the InnoDB Plugin, but not the built-in InnoDB storage engine. (Bug #12861864, Bug #11766019) * InnoDB: When data was removed from an InnoDB table, newly inserted data might not reuse the freed disk blocks, leading to an unexpected size increase for the system tablespace or .ibd file (depending on the setting of innodb_file_per_table. The OPTIMIZE TABLE could compact a .ibd file in some cases but not others. The freed disk blocks would eventually be reused as additional data was inserted. (Bug #11766634, Bug #59783) * Partitioning: After updating a row of a partitioned table and selecting that row within the same transaction with the query cache enabled, then performing a ROLLBACK, the same result was returned by an identical SELECT issued in a new transaction. (Bug #11761296, Bug #53775) * Replication: The --relay-log-space-limit option was sometimes ignored. More specifically, when the SQL thread went to sleep, it allowed the I/O thread to queue additional events in such a way that the relay log space limit was bypassed, and the number of events in the queue could grow well past the point where the relay logs needed to be rotated. Now in such cases, the SQL thread checks to see whether the I/O thread should rotate and provide the SQL thread a chance to purge the logs (thus freeing space). Note that, when the SQL thread is in the middle of a transaction, it cannot purge the logs; it can only ask for more events until the transaction is complete. Once the transaction is finished, the SQL thread can immediately instruct the I/O thread to rotate. (Bug #12400313, Bug #64503) References: See also Bug #13806492. * Mishandling of NO_BACKSLASH_ESCAPES SQL mode within stored procedures on slave servers could cause replication failures. (Bug #12601974) * If the system time was adjusted backward during query execution, the apparent execution time could be negative. But in some cases these queries would be written to the slow query log, with the negative execution time written as a large unsigned number. Now statements with apparent negative execution time are not written to the slow query log. (Bug #63524, Bug #13454045) References: See also Bug #27208. * mysql_store_result() and mysql_use_result() are not for use with prepared statements and are not intended to be called following mysql_stmt_execute(), but failed to return an error when invoked that way in libmysqld. (Bug #62136,
Re: 回复: Why is creating indexes faster after inserting massive data rows?
- Original Message - From: Zhangzhigang zzgang_2...@yahoo.com.cn Ok, Creating the index *after* the inserts, the index gets created in a single operation. But the indexes has to be updating row by row after the data rows has all been inserted. Does it work in this way? No, when you create an index on an existing table (like after a mass insert), what happens is that the engine does a single full tablescan and builds the index in a single pass, which is a lot more performant than updating a single disk block for every record, for the simple reason that a single disk block can contain dozens of index entries. Imagine that you insert one million rows, and you have 100 index entries in a disk block (random numbers, to make a point. Real numbers will depend on storage, file system, index, et cetera). Obviously there's no way to write less than a single block to disk - that's how it works. You can update your index for each record in turn. That means you will need to do 1 million index - and thus block - writes; plus additional reads for those blocks you don't have in memory - that's the index cache. Now, if you create a new index on an existing table, you are first of all bypassing any index read operations - there *is* no index to read, yet. Then the system is going to do a full tablescan - considered slow, but you need all the data, so there's no better way anyway. The index will be built - in-memory as much as possible - and the system will automatically prefer to write only complete blocks - 10.000 of them. That's the exact same number of index blocks, but you only write each block once, so that's only 10.000 writes instead of 1.000.000. Now there's a lot more at play, things like B-tree balancing and whatnot, but that's the basic picture. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel
RE: Why is creating indexes faster after inserting massive data rows?
* Batch INSERTs run faster than one-row-at-a-time, but this is unrelated to INDEX updating speed. * The cache size is quite important to dealing with indexing during INSERT; see http://mysql.rjweb.org/doc.php/memory * Note that mysqldump sets up for an efficient creation of indexes after loading the data. This is not practical (or necessarily efficient) when incremental INSERTing into a table. As for the original question... * Updating the index(es) for one row often involves random BTree traversals. When the index(es) are too big to be cached, this can involve disk hit(s) for each row inserted. * By doing all the indexes after building the table (or at least all the non-UNIQUE indexes), sort merge can be used. This technique had been highly optimized over the past half-century, and is more efficient. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.be] Sent: Monday, May 07, 2012 1:29 AM To: Zhangzhigang Cc: mysql@lists.mysql.com Subject: Re: Why is creating indexes faster after inserting massive data rows? - Original Message - From: Zhangzhigang zzgang_2...@yahoo.com.cn Creating indexes after inserting massive data rows is faster than before inserting data rows. Please tell me why. Plain and simple: the indices get updated after every insert statement, whereas if you only create the index *after* the inserts, the index gets created in a single operation, which is a lot more efficient. I seem to recall that inside of a transaction (thus, InnoDB or so) the difference is markedly less; I might be wrong, though. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: 回复: Why is creating indexes faster after inserting massive data rows?
As a side note, TokuDB uses what it calls fractal technology to somewhat improve the performance of incremental INDEXing. They delay some of the BTree work so that they can better batch stuff. While waiting for that to finish, queries are smart enough to look in more than one place for the index info. InnoDB does something similar, but it is limited to the size of the buffer_pool. -Original Message- From: Johan De Meersman [mailto:vegiv...@tuxera.be] Sent: Monday, May 07, 2012 8:06 AM To: Zhangzhigang Cc: mysql@lists.mysql.com Subject: Re: 回复: Why is creating indexes faster after inserting massive data rows? - Original Message - From: Zhangzhigang zzgang_2...@yahoo.com.cn Ok, Creating the index *after* the inserts, the index gets created in a single operation. But the indexes has to be updating row by row after the data rows has all been inserted. Does it work in this way? No, when you create an index on an existing table (like after a mass insert), what happens is that the engine does a single full tablescan and builds the index in a single pass, which is a lot more performant than updating a single disk block for every record, for the simple reason that a single disk block can contain dozens of index entries. Imagine that you insert one million rows, and you have 100 index entries in a disk block (random numbers, to make a point. Real numbers will depend on storage, file system, index, et cetera). Obviously there's no way to write less than a single block to disk - that's how it works. You can update your index for each record in turn. That means you will need to do 1 million index - and thus block - writes; plus additional reads for those blocks you don't have in memory - that's the index cache. Now, if you create a new index on an existing table, you are first of all bypassing any index read operations - there *is* no index to read, yet. Then the system is going to do a full tablescan - considered slow, but you need all the data, so there's no better way anyway. The index will be built - in-memory as much as possible - and the system will automatically prefer to write only complete blocks - 10.000 of them. That's the exact same number of index blocks, but you only write each block once, so that's only 10.000 writes instead of 1.000.000. Now there's a lot more at play, things like B-tree balancing and whatnot, but that's the basic picture. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel
Re: 回复: Why is creating indexes faster after inserting massive data rows?
Hi, A couple cents to this. There isn't really a million of block writes. The record gets added to the block, but that gets modified in OS cache if we assume MyISAM tables and in the Innodb buffer if we assume InnoDB tables. In both cases, the actual writing does not take place and does not slow down the process.What does however happen for each operation, is processing the statement, locating the entries to update in the index, index block splits and , for good reason, committing. When it comes to creating an index, what needs to happen, is to read the whole table and to sort all rows by the index key. The latter process will be the most determining factor in answering the original question, because for the large tables the sort will have to do a lot of disk I/O.The point I am trying to make is there will be situations when creating indexes and then inserting the rows will be faster than creating an index afterwards. If we try to determine such situations, we could notice that the likelihood of the sort going to disk increases with the amount of distinct values to be sorted. For this reason, my choice would be to create things like primary/unique keys beforehand unless I am certain that everything will fit in the available memory. Peace Karen On May 7, 2012, at 8:05 AM, Johan De Meersman wrote: - Original Message - From: Zhangzhigang zzgang_2...@yahoo.com.cn Ok, Creating the index *after* the inserts, the index gets created in a single operation. But the indexes has to be updating row by row after the data rows has all been inserted. Does it work in this way? No, when you create an index on an existing table (like after a mass insert), what happens is that the engine does a single full tablescan and builds the index in a single pass, which is a lot more performant than updating a single disk block for every record, for the simple reason that a single disk block can contain dozens of index entries. Imagine that you insert one million rows, and you have 100 index entries in a disk block (random numbers, to make a point. Real numbers will depend on storage, file system, index, et cetera). Obviously there's no way to write less than a single block to disk - that's how it works. You can update your index for each record in turn. That means you will need to do 1 million index - and thus block - writes; plus additional reads for those blocks you don't have in memory - that's the index cache. Now, if you create a new index on an existing table, you are first of all bypassing any index read operations - there *is* no index to read, yet. Then the system is going to do a full tablescan - considered slow, but you need all the data, so there's no better way anyway. The index will be built - in-memory as much as possible - and the system will automatically prefer to write only complete blocks - 10.000 of them. That's the exact same number of index blocks, but you only write each block once, so that's only 10.000 writes instead of 1.000.000. Now there's a lot more at play, things like B-tree balancing and whatnot, but that's the basic picture. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: 回复: Why is creating indexes faster after inserting massive data rows?
Good point about key buffer. I was only thinking about the table updates for MyISAM, not indexes. The being stuck waiting for buffer flush could also happen. However, for the table blocks this would be the same issue as with load followed by index rebuild, and for the indexes, it will have to be compared, performance-wise, with an expense of sorting an equally sized index. On May 7, 2012, at 10:40 AM, Rick James wrote: (Correction to Karen's comments) * MyISAM does all its index operations in the key_buffer, similar to InnoDB and its buffer_pool. * Yes, writes are delayed (in both engines), but not forever. If the table is huge, you will eventually be stuck waiting for blocks to be flushed from cache. * If the table is small enough, all the I/O can be delayed, and done only once. So yes, the in-memory cache may be faster. Based on this discussion, you should note that random indexes, such as GUIDs, MD5s, etc, tend to -Original Message- From: Karen Abgarian [mailto:a...@apple.com] Sent: Monday, May 07, 2012 10:31 AM To: mysql@lists.mysql.com Subject: Re: 回复: Why is creating indexes faster after inserting massive data rows? Hi, A couple cents to this. There isn't really a million of block writes. The record gets added to the block, but that gets modified in OS cache if we assume MyISAM tables and in the Innodb buffer if we assume InnoDB tables. In both cases, the actual writing does not take place and does not slow down the process.What does however happen for each operation, is processing the statement, locating the entries to update in the index, index block splits and , for good reason, committing. When it comes to creating an index, what needs to happen, is to read the whole table and to sort all rows by the index key. The latter process will be the most determining factor in answering the original question, because for the large tables the sort will have to do a lot of disk I/O.The point I am trying to make is there will be situations when creating indexes and then inserting the rows will be faster than creating an index afterwards. If we try to determine such situations, we could notice that the likelihood of the sort going to disk increases with the amount of distinct values to be sorted. For this reason, my choice would be to create things like primary/unique keys beforehand unless I am certain that everything will fit in the available memory. Peace Karen On May 7, 2012, at 8:05 AM, Johan De Meersman wrote: - Original Message - From: Zhangzhigang zzgang_2...@yahoo.com.cn Ok, Creating the index *after* the inserts, the index gets created in a single operation. But the indexes has to be updating row by row after the data rows has all been inserted. Does it work in this way? No, when you create an index on an existing table (like after a mass insert), what happens is that the engine does a single full tablescan and builds the index in a single pass, which is a lot more performant than updating a single disk block for every record, for the simple reason that a single disk block can contain dozens of index entries. Imagine that you insert one million rows, and you have 100 index entries in a disk block (random numbers, to make a point. Real numbers will depend on storage, file system, index, et cetera). Obviously there's no way to write less than a single block to disk - that's how it works. You can update your index for each record in turn. That means you will need to do 1 million index - and thus block - writes; plus additional reads for those blocks you don't have in memory - that's the index cache. Now, if you create a new index on an existing table, you are first of all bypassing any index read operations - there *is* no index to read, yet. Then the system is going to do a full tablescan - considered slow, but you need all the data, so there's no better way anyway. The index will be built - in-memory as much as possible - and the system will automatically prefer to write only complete blocks - 10.000 of them. That's the exact same number of index blocks, but you only write each block once, so that's only 10.000 writes instead of 1.000.000. Now there's a lot more at play, things like B-tree balancing and whatnot, but that's the basic picture. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: MySQL Community Server 5.1.63 has been released
Hi, D.1.1. Changes in MySQL 5.1.63 (7th May, 2012) Bugs Fixed * Security Fix: Bug #64884 was fixed. * Security Fix: Bug #59387 was fixed. Anyone want to elaborate on the nature or severity of the security problem? Both are private / inaccessible to me. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
回复: 回复: Why is creating indexes faster after inserting massive data rows?
Karen... The mysql does not use this approach what you said which is complicated. I agree with ohan De Meersman. 发件人: Karen Abgarian a...@apple.com 收件人: mysql@lists.mysql.com 发送日期: 2012年5月8日, 星期二, 上午 1:30 主题: Re: 回复: Why is creating indexes faster after inserting massive data rows? Hi, A couple cents to this. There isn't really a million of block writes. The record gets added to the block, but that gets modified in OS cache if we assume MyISAM tables and in the Innodb buffer if we assume InnoDB tables. In both cases, the actual writing does not take place and does not slow down the process. What does however happen for each operation, is processing the statement, locating the entries to update in the index, index block splits and , for good reason, committing. When it comes to creating an index, what needs to happen, is to read the whole table and to sort all rows by the index key. The latter process will be the most determining factor in answering the original question, because for the large tables the sort will have to do a lot of disk I/O. The point I am trying to make is there will be situations when creating indexes and then inserting the rows will be faster than creating an index afterwards. If we try to determine such situations, we could notice that the likelihood of the sort going to disk increases with the amount of distinct values to be sorted. For this reason, my choice would be to create things like primary/unique keys beforehand unless I am certain that everything will fit in the available memory. Peace Karen On May 7, 2012, at 8:05 AM, Johan De Meersman wrote: - Original Message - From: Zhangzhigang zzgang_2...@yahoo.com.cn Ok, Creating the index *after* the inserts, the index gets created in a single operation. But the indexes has to be updating row by row after the data rows has all been inserted. Does it work in this way? No, when you create an index on an existing table (like after a mass insert), what happens is that the engine does a single full tablescan and builds the index in a single pass, which is a lot more performant than updating a single disk block for every record, for the simple reason that a single disk block can contain dozens of index entries. Imagine that you insert one million rows, and you have 100 index entries in a disk block (random numbers, to make a point. Real numbers will depend on storage, file system, index, et cetera). Obviously there's no way to write less than a single block to disk - that's how it works. You can update your index for each record in turn. That means you will need to do 1 million index - and thus block - writes; plus additional reads for those blocks you don't have in memory - that's the index cache. Now, if you create a new index on an existing table, you are first of all bypassing any index read operations - there *is* no index to read, yet. Then the system is going to do a full tablescan - considered slow, but you need all the data, so there's no better way anyway. The index will be built - in-memory as much as possible - and the system will automatically prefer to write only complete blocks - 10.000 of them. That's the exact same number of index blocks, but you only write each block once, so that's only 10.000 writes instead of 1.000.000. Now there's a lot more at play, things like B-tree balancing and whatnot, but that's the basic picture. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql
Re: 回复: 回复: Why is creating indexes faster after inserting massive data rows?
Honestly, I did not understand that. I did not say anything about being complicated. What does mysql not use, caching?? Judging by experience, creating a unique index on say, a 200G table could be a bitter one. On 07.05.2012, at 19:26, Zhangzhigang wrote: Karen... The mysql does not use this approach what you said which is complicated. I agree with ohan De Meersman. 发件人: Karen Abgarian a...@apple.com 收件人: mysql@lists.mysql.com 发送日期: 2012年5月8日, 星期二, 上午 1:30 主题: Re: 回复: Why is creating indexes faster after inserting massive data rows? Hi, A couple cents to this. There isn't really a million of block writes. The record gets added to the block, but that gets modified in OS cache if we assume MyISAM tables and in the Innodb buffer if we assume InnoDB tables. In both cases, the actual writing does not take place and does not slow down the process.What does however happen for each operation, is processing the statement, locating the entries to update in the index, index block splits and , for good reason, committing. When it comes to creating an index, what needs to happen, is to read the whole table and to sort all rows by the index key. The latter process will be the most determining factor in answering the original question, because for the large tables the sort will have to do a lot of disk I/O.The point I am trying to make is there will be situations when creating indexes and then inserting the rows will be faster than creating an index afterwards. If we try to determine such situations, we could notice that the likelihood of the sort going to disk increases with the amount of distinct values to be sorted. For this reason, my choice would be to create things like primary/unique keys beforehand unless I am certain that everything will fit in the available memory. Peace Karen On May 7, 2012, at 8:05 AM, Johan De Meersman wrote: - Original Message - From: Zhangzhigang zzgang_2...@yahoo.com.cn Ok, Creating the index *after* the inserts, the index gets created in a single operation. But the indexes has to be updating row by row after the data rows has all been inserted. Does it work in this way? No, when you create an index on an existing table (like after a mass insert), what happens is that the engine does a single full tablescan and builds the index in a single pass, which is a lot more performant than updating a single disk block for every record, for the simple reason that a single disk block can contain dozens of index entries. Imagine that you insert one million rows, and you have 100 index entries in a disk block (random numbers, to make a point. Real numbers will depend on storage, file system, index, et cetera). Obviously there's no way to write less than a single block to disk - that's how it works. You can update your index for each record in turn. That means you will need to do 1 million index - and thus block - writes; plus additional reads for those blocks you don't have in memory - that's the index cache. Now, if you create a new index on an existing table, you are first of all bypassing any index read operations - there *is* no index to read, yet. Then the system is going to do a full tablescan - considered slow, but you need all the data, so there's no better way anyway. The index will be built - in-memory as much as possible - and the system will automatically prefer to write only complete blocks - 10.000 of them. That's the exact same number of index blocks, but you only write each block once, so that's only 10.000 writes instead of 1.000.000. Now there's a lot more at play, things like B-tree balancing and whatnot, but that's the basic picture. -- Bier met grenadyn Is als mosterd by den wyn Sy die't drinkt, is eene kwezel Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql