回复：回复：回复： Why is creating indexes faster after inserting massive data rows?

Zhangzhigang Mon, 07 May 2012 23:44:43 -0700

Ok, OS cache.
> There isn't really a million of block writes.   The record gets 
added to the block, but that gets modified in OS cache if we assume 
MyISAM tables and in the Innodb buffer if we assume InnoDB tables.


As i known, the mysql writes the data to disk directly but does not use the Os 
cache when the table is updating.

If it writes to the Os cache, which leads to massive system invoking, when the 
table is inserted a lot of rows one by one. 




________________________________
 发件人： Karen Abgarian <a...@apple.com>
收件人： mysql@lists.mysql.com 
发送日期： 2012年5月8日, 星期二, 上午 11:37
主题: Re: 回复： 回复： Why is creating indexes faster after inserting massive data 
rows?
 
Honestly, I did not understand that.   I did not say anything about being 
complicated.  What does mysql not use, caching??

Judging by experience, creating a unique index on say, a 200G table could be a 
bitter one.  


On 07.05.2012, at 19:26, Zhangzhigang wrote:

> Karen...
> 
> The mysql does not use this approach what you said which is complicated.
> 
> I  agree with ohan De Meersman.
> 
> 
> ________________________________
> 发件人： Karen Abgarian <a...@apple.com>
> 收件人： mysql@lists.mysql.com 
> 发送日期： 2012年5月8日, 星期二, 上午 1:30
> 主题: Re: 回复： Why is creating indexes faster after inserting massive data rows?
> 
> Hi, 
> 
> A couple cents to this. 
> 
> There isn't really a million of block writes.   The record gets added to the 
> block, but that gets modified in OS cache if we assume MyISAM tables and in 
> the Innodb buffer if we assume InnoDB tables.   In both cases, the actual 
> writing does not take place and does not slow down the process.    What does 
> however happen for each operation, is processing the statement, locating the 
> entries to update in the index, index block splits and , for good reason, 
> committing.  
> 
> When it comes to creating an index, what needs to happen, is to read the 
> whole table and to sort all rows by the index key.   The latter process will 
> be the most determining factor in answering the original question, because 
> for the large tables the sort will have to do a lot of disk I/O.    The point 
> I am trying to make is there will be situations when creating indexes and 
> then inserting the rows will be faster than creating an index afterwards.   
> If we try to determine such situations, we could notice that the likelihood 
> of the sort going to disk increases with the amount of distinct values to be 
> sorted.   For this reason, my choice would be to create things like 
> primary/unique keys beforehand unless I am certain that everything will fit 
> in the available memory. 
> 
> Peace
> Karen
> 
> 
> 
> On May 7, 2012, at 8:05 AM, Johan De Meersman wrote:
> 
>> ----- Original Message -----
>> 
>>> From: "Zhangzhigang" <zzgang_2...@yahoo.com.cn>
>> 
>>> Ok, Creating the index *after* the inserts, the index gets created in
>>> a single operation.
>>> But the indexes has to be updating row by row after the data rows has
>>> all been inserted. Does it work in this way?
>> No, when you create an index on an existing table (like after a mass 
>> insert), what happens is that the engine does a single full tablescan and 
>> builds the index in a single pass, which is a lot more performant than 
>> updating a single disk block for every record, for the simple reason that a 
>> single disk block can contain dozens of index entries. 
>> 
>> Imagine that you insert one million rows, and you have 100 index entries in 
>> a disk block (random numbers, to make a point. Real numbers will depend on 
>> storage, file system, index, et cetera). Obviously there's no way to write 
>> less than a single block to disk - that's how it works. 
>> 
>> You can update your index for each record in turn. That means you will need 
>> to do 1 million index - and thus block - writes; plus additional reads for 
>> those blocks you don't have in memory - that's the index cache. 
>> 
>> Now, if you create a new index on an existing table, you are first of all 
>> bypassing any index read operations - there *is* no index to read, yet. Then 
>> the system is going to do a full tablescan - considered slow, but you need 
>> all the data, so there's no better way anyway. The index will be built - 
>> in-memory as much as possible - and the system will automatically prefer to 
>> write only complete blocks - 10.000 of them. That's the exact same number of 
>> index blocks, but you only write each block once, so that's only 10.000 
>> writes instead of 1.000.000. 
>> 
>> Now there's a lot more at play, things like B-tree balancing and whatnot, 
>> but that's the basic picture. 
>> 
>> -- 
>> 
>> Bier met grenadyn 
>> Is als mosterd by den wyn 
>> Sy die't drinkt, is eene kwezel 
>> Hy die't drinkt, is ras een ezel 
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

回复： 回复： 回复： Why is creating indexes faster after inserting massive data rows?

Reply via email to

回复：回复：回复： Why is creating indexes faster after inserting massive data rows?