MaxNoOfOpenFiles Error
Hi All, I keep on having this on my cluster, in effect ndb on the nodes shuts down: Status: Permanent error, external action needed Message: Max number of open files exceeded, please increase MaxNoOfOpenFiles (Resource configuration error) Error: 2806 Error data: Error object: Ndbfs::createAsyncFile Program: /usr/sbin/ndbd Pid: 13388 Trace: /var/lib/mysql-cluster/ndb_4_trace.log.8 Version: Version 5.0.51 I have tried increasing MaxNoOfOpenFiles by 100, i started at 500, i'm now at 1200 and i'm still receiving the error. Regards Ron -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: How to optimize a slow query?
Thanks for your reply, Mike. Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seem to be too long. I run the same join by using SQL procedure in a statistical software called SAS on a similar machine. It only takes 1 minute and 3 seconds. Yes, it is a 1:1 relationship between table RItime and MVtime. However, I don't get your suggestion, I'd recommend joining the two tables into 1 table so you don't have to join them in the first place. Could you elaborate that? I was using B-tree index. Switching to HASH does help. Now, it takes 4 min 50.17 sec to run the query. I also turn on profiling by using mysql set profiling = 1; Query OK, 0 rows affected (0.01 sec) After the query finishes, I get mysql show profile; +--++ | Status | Duration | +--++ | starting | 0.000123 | | checking permissions | 0.10 | | Opening tables | 0.44 | | System lock | 0.07 | | Table lock | 0.11 | | init | 0.83 | | creating table | 0.003428 | | After create | 0.000124 | | System lock | 0.04 | | Table lock | 0.51 | | optimizing | 0.07 | | statistics | 0.33 | | preparing| 0.20 | | executing| 0.04 | | Sending data | 290.153530 | | end | 0.08 | | end | 0.04 | | query end| 0.03 | | freeing items| 0.10 | | closing tables | 0.25 | | logging slow query | 0.01 | | logging slow query | 0.013429 | | cleaning up | 0.04 | +--++ 23 rows in set (0.02 sec) MySQL spends most of its time sending data. According to http://dev.mysql.com/doc/refman/5.0/en/general-thread-states.html, sending data means that the thread is processing rows for a |SELECT| http://dev.mysql.com/doc/refman/5.0/en/select.html statement and also is sending data to the client. Is there more room to optimize this query? Thanks again. Best, Jia mos wrote: How many rows were added to rmpdata1 table? If it is 13.4 million rows then it is going to take several minutes to join this many rows from the 2 tables. Is there a 1:1 relationship between the two tables or a 1:Many? If there is a 1:1 then I'd recommend joining the two tables into 1 table so you don't have to join them in the first place. The only other thing I can suggest is to change the type of index on the tables being joined to see if that makes a speed difference. For example, if you are using BTREE then switch to HASH or vice versa. See http://dev.mysql.com/doc/refman/5.0/en/create-index.html for more info. Mike At 10:05 AM 9/5/2009, Jia Chen wrote: Hi there, One simple query took more than 10 minutes. Here is how relevant rows in the slow query log looks like: # Time: 090905 10:49:57 # u...@host: root[root] @ localhost [] # Query_time: 649 Lock_time: 0 Rows_sent: 0 Rows_examined: 26758561 use world; create table rmpdata1 select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode, coalesce(ri.ndate,mv.ndate) as cndate from RItime as ri left outer join MVtime as mv on (ri.code=mv.code and ri.ndate=mv.ndate); When I explain only the select clause, I get +--+---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---++---+-+-+--+--+---+ | 1 | SIMPLE | ri| ALL| NULL | NULL| NULL | NULL | 13419851 | | | 1 | SIMPLE | mv| eq_ref | PRIMARY | PRIMARY | 11 | world.ri.code,world.ri.ndate |1 | | ++-+---++---+-+-+--+--+---+ 2 rows in set (0.00 sec) I use show table status from world; to get information about two tables, RItime and MVtime, in the join clause: Name: RItime Engine: MyISAM Version: 10 Row_format: Dynamic Rows: 13419851 Avg_row_length: 31 Data_length: 427721848 Max_data_length: 281474976710655 Index_length: 347497472 Data_free: 0 Auto_increment: NULL Create_time: 2009-09-03 10:17:57 Update_time: 2009-09-03 12:04:02 Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: *** 2. row *** Name: MVtime Engine: MyISAM Version: 10 Row_format: Dynamic Rows: 13562373 Avg_row_length: 31 Data_length: 430220056 Max_data_length: 281474976710655 Index_length: 350996480 Data_free: 0 Auto_increment: NULL Create_time: 2009-09-03 13:31:33 Update_time: 2009-09-03
Re: Database design - help
Again, please forgive my total ignorance. My ERD shows that the web links (URL table) are connected, via the sub-categories (SubCat table), to the main categories (Categories table). Is this correct for what I am trying to achieve ? Or should I also link the URL table to the Categories table ? Have been trying to create the ER Diagram with MySQL Workbench, and getting very frustrated. So far I have this www.ProBowlUK.co.uk/files/ERD_00.pdf cheers - Original Message - From: Martin Gainty To: bobsh...@ntlworld.com ; mysql@lists.mysql.com Cc: john.l.me...@gmail.com Sent: Friday, September 04, 2009 6:09 PM Subject: RE: Database design - help given the following table layouts URLs: URL_ID (primary key for URL) URL_TEXT URL_CATEGORY URL_ID (key which points to URL.URL_ID) CATEGORY_ID (key which points to CATEGORY.CATEGORY_ID) SUBCATEGORY_ID PK: (URL_ID, CATEGORY_ID) CATEGORY CATEGORY_ID (primary Key for Category) CATEGORY_TEXT SUBCAT SUBCAT_ID(concatenated key for SubCat) CATEGORY_ID (concatenated key for Subcat) SUBCAT_TEXTso the diagram would look something like like URL_CATEGORY Table (URL Table) (CATEGORY TABLE) URL_ID1-1 URL.URL_ID CATEGORY.CATEGORY_ID1---1CATEGORY_IDURL_TEXT 1 ↓ 1 SUBCAT.CATEGORY_ID SUBCAT.SUBCAT_TEXT this is labour-intensive work that every DBA must perform to create a Database Martin Gainty __ From: bobsh...@ntlworld.com To: mysql@lists.mysql.com CC: john.l.me...@gmail.com Subject: Re: Database design - help Date: Fri, 4 Sep 2009 16:24:22 +0100 Hi Thanks for all the responses. However I am still stuck for a MySQL db I can create and code in PHP. Attached is a brief example of data to be used. One problem I have is with providing a listing that includes ... WTBC (Category without SubCats) and the 3 Zones (also, Cats without SubCats ??? ) (This is for a complete WTBC listing, in practice it may list depending on selected Zone) The example Schema is interesting, but is there another way of storing all links in one table and join them to Category and SubCat tables ? An example of the ER Diagram would also be helpful to me. cheers -- I am using the free version of SPAMfighter. We are a community of 6 million users fighting spam. SPAMfighter has removed 13901 of my spam emails to date. Get the free SPAMfighter here: http://www.spamfighter.com/len The Professional version does not have this message No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.77/2346 - Release Date: 09/04/09 17:51:00 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: How to optimize a slow query?
Jia, Yes, it is a 1:1 relationship between table RItime and MVtime. However, I don't get your suggestion, I'd recommend joining the two tables into 1 table so you don't have to join them in the first place. Could you elaborate that? Sure but first I have to relate it to my own experience. I had 8 tables of around 25 million rows each. They all had a 1:1 relationship and occasionally some of the tables did not have a corresponding row. I felt it was better from a design point of view to have 8 different tables and do the joins on the tables that I needed for each of my particular queries. I'd be joining anywhere from 2 to 5 or 6 or even all 8 tables at a time, using a where clause to select 15k rows at a time. This is the way to do it from a normalized point of view. All of the information is in its respective table and only assemble the tables for each particular query. Well, this was slwww! A heck of a lot of work was done to join the tables together on a 2 column key (like yours). I also had to run maintenance on the tables to see which tables where corrupted or were missing rows that should have been there. The tables also repeated columns from the other tables like date and product_id that is used to help identify each row. Well to make a long story short, it was far too much effort to juggle the relationships between all of these tables. Then a colleague made the monumental announcement by saying I've never found the need to use more than 1 table when there was a 1:1 relationship. There is a tremendous speed cost involved in piecing the data back together. I put all of the data into 1 table. So the light went on for me. From then on I've merged all 8 tables into one and if any of the subordinate table data isn't available for a row, its columns are set to NULL, which is the values they would have had anyway after a left join. I am perfectly happy know with one wide table with over 100 columns. Everything is in its place and maintenance is a dream. Queries are also quite fast because all of the information is under one table and not 8. I don't have to worry about optimizing the indexes for the table joins because there aren't any joins between these tables because it is all in 1 row. So you really have to ask yourself, why spend 10 minutes each time your query is run? Instead you eliminate the query altogether by keeping the data of the 2 tables into 1 table in the first place. Mike At 09:45 AM 9/6/2009, Jia Chen wrote: Thanks for your reply, Mike. Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seem to be too long. I run the same join by using SQL procedure in a statistical software called SAS on a similar machine. It only takes 1 minute and 3 seconds. Yes, it is a 1:1 relationship between table RItime and MVtime. However, I don't get your suggestion, I'd recommend joining the two tables into 1 table so you don't have to join them in the first place. Could you elaborate that? I was using B-tree index. Switching to HASH does help. Now, it takes 4 min 50.17 sec to run the query. I also turn on profiling by using mysql set profiling = 1; Query OK, 0 rows affected (0.01 sec) After the query finishes, I get mysql show profile; +--++ | Status | Duration | +--++ | starting | 0.000123 | | checking permissions | 0.10 | | Opening tables | 0.44 | | System lock | 0.07 | | Table lock | 0.11 | | init | 0.83 | | creating table | 0.003428 | | After create | 0.000124 | | System lock | 0.04 | | Table lock | 0.51 | | optimizing | 0.07 | | statistics | 0.33 | | preparing| 0.20 | | executing| 0.04 | | Sending data | 290.153530 | | end | 0.08 | | end | 0.04 | | query end| 0.03 | | freeing items| 0.10 | | closing tables | 0.25 | | logging slow query | 0.01 | | logging slow query | 0.013429 | | cleaning up | 0.04 | +--++ 23 rows in set (0.02 sec) MySQL spends most of its time sending data. According to http://dev.mysql.com/doc/refman/5.0/en/general-thread-states.html, sending data means that the thread is processing rows for a |SELECT| http://dev.mysql.com/doc/refman/5.0/en/select.html statement and also is sending data to the client. Is there more room to optimize this query? Thanks again. Best, Jia mos wrote: How many rows were added to rmpdata1 table? If it is 13.4 million rows then it is going to take several minutes to join this many rows from the 2 tables. Is there a 1:1 relationship between the two tables or a 1:Many? If there is a 1:1 then I'd recommend joining the two tables into 1 table
Re: Speeding up a pretty simple correlated update query
Hello All, I'm reposting this since I didn't get much response the last time, so I'm hoping to reach out again. My correlated update query (see below) was running for 9 days before I killed it. Here is my original question: I have a legacy application which was written using a compound primary key of an item number (non unique) along with a category ID. The combination of the item number and category ID make the records unique. I am in the process of replacing the compound (VARCHAR) keys with an unique integer key in these tables. So I have created an item_seq table and assigned a unique sequence number to each compound key -- it looks like this (all tables are myisam tables, and mysql version 5.0) desc item_seq; +---+--+--+-+-++ | Field | Type | Null | Key | Default | Extra | +---+--+--+-+-++ | seq | int(10) unsigned | NO | PRI | NULL| auto_increment | | itemid| char(11) | NO | MUL | || | category | char(4) | NO | | || +---+--+--+-+-++ I also have my main transactional table with about 180,000,000 rows -- it looks like this: desc item_trans; +-+---+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +-+---+--+-+-+---+ | seq | int(10) unsigned | NO | MUL | | | | itemid | char(11) | NO | PRI | | | | category| char(4) | NO | PRI | | | | transid | int(10) | NO | PRI | | | Currently the seq field is null for the entire table. So of course, I want to update the main transaction table with the new sequence number. So I've disabled all the keys on the item_trans table -- since I am updating every row, it wouldn't (shouldn't) be using the index anyway. Here is my correlated update query: update item_trans i, item_seq is set i.seq=is.seq where is.itemid=i.itemid and is.category=i.category; If I run an explain on the select version of the update, this is what I get: ++-+--++---++-++---+---+ | id | select_type | table| type | possible_keys | key| key_len | ref| rows | Extra | ++-+--++---++-++---+---+ | 1 | SIMPLE | item_trans| ALL| PRIMARY | NULL | NULL | NULL | 178948797 | | | 1 | SIMPLE | item_seq | eq_ref | itemid| itemid | 20 | g.item_trans.itemid,g.item_trans.category| 1 | | ++-+--++---++-++---+---+ ... which is exactly what I would expect it to do. Update every record of the item_trans table, and do a full index lookup on the items_seq table. SO... I've been running this query to update item_trans, and it's been running for 5 days now. I've also tried running this with the primary key index on the item_trans table (but not the seq index), and that ran slower in my initial tests. Are there any faster ways to update 180 million records with a correlated update query? And I'm fairly certain that trying to do this in PHP one-record at a time would take much longer than a SQL solution. Thanks, -Hank
Re: How to optimize a slow query?
Hi Mike, Thanks for your detailed answer. Now, I understand what you mean. And, yes, I agree with you that keeping all data in one table works better for a bunch of 1:1 relationship tables. Actually, this is what I was trying to do with that query. Since you mention They all had a 1:1 relationship and occasionally some of the tables did not have a corresponding row. and From then on I've merged all 8 tables into one and if any of the subordinate table data isn't available for a row, its columns are set to NULL, I do want to ask you about how you set the columns to NULL for rows in subordinate table data unavailable because I have similar situation. If I want to combine two tables into one, I think that a full outer join can achieve what you did. However, MySQL does not have full outer join. So, I am using create table rmpdata1 (select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode, coalesce(ri.ndate,mv.ndate) as cndate from RItime as ri left outer join MVtime as mv on (ri.code=mv.code and ri.ndate=mv.ndate)) union (select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode, coalesce(ri.ndate,mv.ndate) as cndate from RItime as ri right outer join MVtime as mv on (ri.code=mv.code and ri.ndate=mv.ndate)); This query takes more than twice as much time as the query in my first e-mail. Do you have a better way? Thanks. Best, Jia mos wrote: Jia, Yes, it is a 1:1 relationship between table RItime and MVtime. However, I don't get your suggestion, I'd recommend joining the two tables into 1 table so you don't have to join them in the first place. Could you elaborate that? Sure but first I have to relate it to my own experience. I had 8 tables of around 25 million rows each. They all had a 1:1 relationship and occasionally some of the tables did not have a corresponding row. I felt it was better from a design point of view to have 8 different tables and do the joins on the tables that I needed for each of my particular queries. I'd be joining anywhere from 2 to 5 or 6 or even all 8 tables at a time, using a where clause to select 15k rows at a time. This is the way to do it from a normalized point of view. All of the information is in its respective table and only assemble the tables for each particular query. Well, this was slwww! A heck of a lot of work was done to join the tables together on a 2 column key (like yours). I also had to run maintenance on the tables to see which tables where corrupted or were missing rows that should have been there. The tables also repeated columns from the other tables like date and product_id that is used to help identify each row. Well to make a long story short, it was far too much effort to juggle the relationships between all of these tables. Then a colleague made the monumental announcement by saying I've never found the need to use more than 1 table when there was a 1:1 relationship. There is a tremendous speed cost involved in piecing the data back together. I put all of the data into 1 table. So the light went on for me. From then on I've merged all 8 tables into one and if any of the subordinate table data isn't available for a row, its columns are set to NULL, which is the values they would have had anyway after a left join. I am perfectly happy know with one wide table with over 100 columns. Everything is in its place and maintenance is a dream. Queries are also quite fast because all of the information is under one table and not 8. I don't have to worry about optimizing the indexes for the table joins because there aren't any joins between these tables because it is all in 1 row. So you really have to ask yourself, why spend 10 minutes each time your query is run? Instead you eliminate the query altogether by keeping the data of the 2 tables into 1 table in the first place. Mike At 09:45 AM 9/6/2009, Jia Chen wrote: Thanks for your reply, Mike. Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seem to be too long. I run the same join by using SQL procedure in a statistical software called SAS on a similar machine. It only takes 1 minute and 3 seconds. Yes, it is a 1:1 relationship between table RItime and MVtime. However, I don't get your suggestion, I'd recommend joining the two tables into 1 table so you don't have to join them in the first place. Could you elaborate that? I was using B-tree index. Switching to HASH does help. Now, it takes 4 min 50.17 sec to run the query. I also turn on profiling by using mysql set profiling = 1; Query OK, 0 rows affected (0.01 sec) After the query finishes, I get mysql show profile; +--++ | Status | Duration | +--++ | starting | 0.000123 | | checking permissions | 0.10 | | Opening tables | 0.44 | | System lock | 0.07 | | Table lock | 0.11
Re: How to optimize a slow query?
Jia, The code you sent seems to be able to get the job done. You could try something simpler by executing 2 sql statements instead of using one. Something like: create table rmpdata1 select ri.*, mv.* from RItime as ri left join MVtime as mv on (ri.code=mv.code and ri.ndate=mv.ndate)); insert into rmpdata1 select ri.*, mv.* from MVtime as mv left join RItime as ri on (ri.code=mv.code and ri.ndate=mv.ndate)) where ri.code is null and ri.date is null; You will of course have to play with the column list to avoid duplicate columns. The Insert statement will insert rows from mv that are missing in ri. Mike At 01:51 PM 9/6/2009, Jia Chen wrote: Hi Mike, Thanks for your detailed answer. Now, I understand what you mean. And, yes, I agree with you that keeping all data in one table works better for a bunch of 1:1 relationship tables. Actually, this is what I was trying to do with that query. Since you mention They all had a 1:1 relationship and occasionally some of the tables did not have a corresponding row. and From then on I've merged all 8 tables into one and if any of the subordinate table data isn't available for a row, its columns are set to NULL, I do want to ask you about how you set the columns to NULL for rows in subordinate table data unavailable because I have similar situation. If I want to combine two tables into one, I think that a full outer join can achieve what you did. However, MySQL does not have full outer join. So, I am using create table rmpdata1 (select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode, coalesce(ri.ndate,mv.ndate) as cndate from RItime as ri left outer join MVtime as mv on (ri.code=mv.code and ri.ndate=mv.ndate)) union (select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode, coalesce(ri.ndate,mv.ndate) as cndate from RItime as ri right outer join MVtime as mv on (ri.code=mv.code and ri.ndate=mv.ndate)); This query takes more than twice as much time as the query in my first e-mail. Do you have a better way? Thanks. Best, Jia mos wrote: Jia, Yes, it is a 1:1 relationship between table RItime and MVtime. However, I don't get your suggestion, I'd recommend joining the two tables into 1 table so you don't have to join them in the first place. Could you elaborate that? Sure but first I have to relate it to my own experience. I had 8 tables of around 25 million rows each. They all had a 1:1 relationship and occasionally some of the tables did not have a corresponding row. I felt it was better from a design point of view to have 8 different tables and do the joins on the tables that I needed for each of my particular queries. I'd be joining anywhere from 2 to 5 or 6 or even all 8 tables at a time, using a where clause to select 15k rows at a time. This is the way to do it from a normalized point of view. All of the information is in its respective table and only assemble the tables for each particular query. Well, this was slwww! A heck of a lot of work was done to join the tables together on a 2 column key (like yours). I also had to run maintenance on the tables to see which tables where corrupted or were missing rows that should have been there. The tables also repeated columns from the other tables like date and product_id that is used to help identify each row. Well to make a long story short, it was far too much effort to juggle the relationships between all of these tables. Then a colleague made the monumental announcement by saying I've never found the need to use more than 1 table when there was a 1:1 relationship. There is a tremendous speed cost involved in piecing the data back together. I put all of the data into 1 table. So the light went on for me. From then on I've merged all 8 tables into one and if any of the subordinate table data isn't available for a row, its columns are set to NULL, which is the values they would have had anyway after a left join. I am perfectly happy know with one wide table with over 100 columns. Everything is in its place and maintenance is a dream. Queries are also quite fast because all of the information is under one table and not 8. I don't have to worry about optimizing the indexes for the table joins because there aren't any joins between these tables because it is all in 1 row. So you really have to ask yourself, why spend 10 minutes each time your query is run? Instead you eliminate the query altogether by keeping the data of the 2 tables into 1 table in the first place. Mike At 09:45 AM 9/6/2009, Jia Chen wrote: Thanks for your reply, Mike. Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seem to be too long. I run the same join by using SQL procedure in a statistical software called SAS on a similar machine. It only takes 1 minute and 3 seconds. Yes, it is a 1:1 relationship between table RItime and MVtime. However, I don't get your suggestion, I'd recommend
sql indentation / reformat / beautifier
I'm looking for a way (better command line under linux) to reformat/ indent/beautify sql code like that (great) online (free) tool: http://www.dpriver.com/pp/sqlformat.htm or better, something to have that directly in emacs. any help? thank you in advance bye bye MAS! -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Speeding up a pretty simple correlated update query
So I've disabled all the keys on the item_trans table -- since I am updating every row, it wouldn't (shouldn't) be using the index anyway. You can't disable unique indexes or primary keys. They are always active. You can only deactivate non-unique indexes. Here are a couple of suggestions. For now drop the index on item_seq.seq and desc.seq. Are you sure you have a compound index on item_seq.itemid and item_seq.category?? do a Show create table item_seq and also Show create table desc to see what you've got. Make sure your my.cnf file has key_buffer_size=500M equal to about 30% of your available memory. You can always reduce it later. Of course there is another way of doing it, if you are willing to have the tail wag the dog. You may kick yourself for not discovering it yourself. :) set @num:=0; set @last:=''; create table new_item_trans select IF(concat(itemid,category),@last, @num:=...@num+1,@num) Seq, itemid, category, transid, ... ,@last:=concat(itemid,category) as TMPLast from item_trans order by concat(itemid,category); Now you can use the Alter statement to add your indexes and get rid of the TMPLast column. To build the item_seq table you would now use: create table item_seq select seq, itemid, category from new_item_trans group by seq, itemid, category; And of course build your indexes on seq and rename the new_item_trans. I guarantee you this last solution will not take 9 days to complete! :-) Mike At 12:32 PM 9/6/2009, Hank wrote: Hello All, I'm reposting this since I didn't get much response the last time, so I'm hoping to reach out again. My correlated update query (see below) was running for 9 days before I killed it. Here is my original question: I have a legacy application which was written using a compound primary key of an item number (non unique) along with a category ID. The combination of the item number and category ID make the records unique. I am in the process of replacing the compound (VARCHAR) keys with an unique integer key in these tables. So I have created an item_seq table and assigned a unique sequence number to each compound key -- it looks like this (all tables are myisam tables, and mysql version 5.0) desc item_seq; +---+--+--+-+-++ | Field | Type | Null | Key | Default | Extra | +---+--+--+-+-++ | seq | int(10) unsigned | NO | PRI | NULL| auto_increment | | itemid| char(11) | NO | MUL | || | category | char(4) | NO | | || +---+--+--+-+-++ I also have my main transactional table with about 180,000,000 rows -- it looks like this: desc item_trans; +-+---+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +-+---+--+-+-+---+ | seq | int(10) unsigned | NO | MUL | | | | itemid | char(11) | NO | PRI | | | | category| char(4) | NO | PRI | | | | transid | int(10) | NO | PRI | | | Currently the seq field is null for the entire table. So of course, I want to update the main transaction table with the new sequence number. So I've disabled all the keys on the item_trans table -- since I am updating every row, it wouldn't (shouldn't) be using the index anyway. Here is my correlated update query: update item_trans i, item_seq is set i.seq=is.seq where is.itemid=i.itemid and is.category=i.category; If I run an explain on the select version of the update, this is what I get: ++-+--++---++-++---+---+ | id | select_type | table| type | possible_keys | key| key_len | ref| rows | Extra | ++-+--++---++-++---+---+ | 1 | SIMPLE | item_trans| ALL| PRIMARY | NULL | NULL | NULL | 178948797 | | | 1 | SIMPLE | item_seq | eq_ref | itemid| itemid | 20 | g.item_trans.itemid,g.item_trans.category| 1 | | ++-+--++---++-++---+---+ ... which is exactly what I would expect it to do. Update every record of the item_trans table, and do a full index lookup on the items_seq table. SO... I've been running this
Re: Speeding up a pretty simple correlated update query
On Sun, Sep 6, 2009 at 6:01 PM, mos mo...@fastmail.fm wrote: So I've disabled all the keys on the item_trans table -- since I am updating every row, it wouldn't (shouldn't) be using the index anyway. You can't disable unique indexes or primary keys. They are always active. You can only deactivate non-unique indexes. Here are a couple of suggestions. For now drop the index on item_seq.seq and desc.seq. Are you sure you have a compound index on item_seq.itemid and item_seq.category?? do a Show create table item_seq and also Show create table desc to see what you've got. Make sure your my.cnf file has key_buffer_size=500M equal to about 30% of your available memory. You can always reduce it later. Of course there is another way of doing it, if you are willing to have the tail wag the dog. You may kick yourself for not discovering it yourself. :) set @num:=0; set @last:=''; create table new_item_trans select IF(concat(itemid,category),@last, @num:=...@num+1,@num) Seq, itemid, category, transid, ... ,@last:=concat(itemid,category) as TMPLast from item_trans order by concat(itemid,category); Now you can use the Alter statement to add your indexes and get rid of the TMPLast column. To build the item_seq table you would now use: create table item_seq select seq, itemid, category from new_item_trans group by seq, itemid, category; And of course build your indexes on seq and rename the new_item_trans. I guarantee you this last solution will not take 9 days to complete! :-) Mike Hi Mike, Thanks for your reply. First, in my tests, I've created the target table (item_trans) as a copy of the source table with no indexes at all (even no primary key). Once I get the item_seq field populated, I'll go back and re-create the indexes in batch using myisamchk (I've posted about this recently). Second, I like your second creative solution (I never would have come up with that), but in order for it to work, mysql would have to sort 180 million records before creating the table or retrieve them out of the table via the contactenated index, both of which I think will take a long time... but I'll certainly give it a shot tomorrow and let you know how it goes. Thanks again. -Hank