indexing on column having duplicate values
Gurus, I am going to have a big table with lot of records, to expedite searching i wanted to index on a key field (which is numeric value). BUT, there will be records which will have same value for the key field (other columns will be different). so how can i do this? right now, i am getting error, about duplicate entries and they are being discarded. All entries are important and I have to find a way to locate records based on this key field. thx for help. Rajeev -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: indexing on column having duplicate values
Am 28.05.2014 21:43, schrieb Rajeev Prasad: I am going to have a big table with lot of records, to expedite searching i wanted to index on a key field (which is numeric value). BUT, there will be records which will have same value for the key field (other columns will be different). so how can i do this? right now, i am getting error, about duplicate entries and they are being discarded. All entries are important and I have to find a way to locate records based on this key field. who said that a key needs to be unique? just get phpMyAdmin to learn such things there you see two different types of keys and after assign you get the sql command signature.asc Description: OpenPGP digital signature
Re: indexing on column having duplicate values
Am 28.05.2014 22:29, schrieb Rajeev Prasad: I am using phpMyAdmin, i looked closely and found index is this index you are talking about? (earlier i used Primary). surely primary is a uniqe key, honestly consider to read some manuals My further question is: the index key here is going to be epoch system time. I currently have it as integer 10. The table will grow very fast (about 5 million rows within 12 months) and will be updated atleast twice in 24 hours. So I have to consider the price i am going to pay to rebuild the index on each INSERT, (about ~42000 records every 12 hours). you need to try it out in doubt what would you/all suggest? shoud I have an index key? all searches will be based on index key mostly, and sometimes another 'unindexed' column. should i change my index key to character type? should i index the other 'unindexed column' also? _or_ should i simply go grabbing on an unindexed table? that are basic questions http://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html On Wednesday, May 28, 2014 2:50 PM, Reindl Harald h.rei...@thelounge.net wrote: Am 28.05.2014 21:43, schrieb Rajeev Prasad: I am going to have a big table with lot of records, to expedite searching i wanted to index on a key field (which is numeric value). BUT, there will be records which will have same value for the key field (other columns will be different). so how can i do this? right now, i am getting error, about duplicate entries and they are being discarded. All entries are important and I have to find a way to locate records based on this key field. who said that a key needs to be unique? just get phpMyAdmin to learn such things there you see two different types of keys and after assign you get the sql command signature.asc Description: OpenPGP digital signature
Re: indexing on column having duplicate values
(re-sending, i got err from yahoo) thx Reindl, I am using phpMyAdmin, i looked closely and found index is this index you are talking about? (earlier i used Primary). My further question is: the index key here is going to be epoch system time. I currently have it as integer 10. The table will grow very fast (about 5 million rows within 12 months) and will be updated atleast twice in 24 hours. So I have to consider the price i am going to pay to rebuild the index on each INSERT, (about ~42000 records every 12 hours). what would you/all suggest? shoud I have an index key? all searches will be based on index key mostly, and sometimes another 'unindexed' column. should i change my index key to character type? should i index the other 'unindexed column' also? _or_ should i simply go grabbing on an unindexed table? thanks and regards. Rajeev On Wednesday, May 28, 2014 3:29 PM, Rajeev Prasad rp.ne...@yahoo.com wrote: thx Reindl, I am using phpMyAdmin, i looked closely and found index is this index you are talking about? (earlier i used Primary). My further question is: the index key here is going to be epoch system time. I currently have it as integer 10. The table will grow very fast (about 5 million rows within 12 months) and will be updated atleast twice in 24 hours. So I have to consider the price i am going to pay to rebuild the index on each INSERT, (about ~42000 records every 12 hours). what would you/all suggest? shoud I have an index key? all searches will be based on index key mostly, and sometimes another 'unindexed' column. should i change my index key to character type? should i index the other 'unindexed column' also? _or_ should i simply go grabbing on an unindexed table? thanks and regards. Rajeev On Wednesday, May 28, 2014 2:50 PM, Reindl Harald h.rei...@thelounge.net wrote: Am 28.05.2014 21:43, schrieb Rajeev Prasad: I am going to have a big table with lot of records, to expedite searching i wanted to index on a key field (which is numeric value). BUT, there will be records which will have same value for the key field (other columns will be different). so how can i do this? right now, i am getting error, about duplicate entries and they are being discarded. All entries are important and I have to find a way to locate records based on this key field. who said that a key needs to be unique? just get phpMyAdmin to learn such things there you see two different types of keys and after assign you get the sql command -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: indexing on column having duplicate values
Am 28.05.2014 22:39, schrieb Rajeev Prasad: (re-sending, i got err from yahoo) your previous message made it off-list to me *don't use reply-all on mailing lists* signature.asc Description: OpenPGP digital signature
Re: Indexing about 40 Billion Entries
Thank you a lot. The first indexing process finished after about 13 hours, so I think the problem is solved now. I set the myisam_sort_bufffer_size to 10GB. For the query I will adjust it to your version. Am 20.06.2012 23:32, schrieb Rick James: SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx ORDER BY overlap DESC MySQL does not optimize that kind of OR well. This will run _much_ faster (with your two indexes): ( SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx UNION SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx ) ORDER BY overlap DESC; Make it UNION ALL or UNION DISTINCT depending on whether xxx can be in both fields of one row. UNION DISTINCT makes a pass over the temp table to dedup. Your version (OR) will do a table scan. My version will: 1. scan ruid1 index (.MYI) (1+ disk hits for index range scan) 2. for each matching ruid1, fetch the data row (in .MYD) (1 hit/row); write to tmp table 3,4. ditto for ruid2, appending to same tmp table 5. sort tmp table (ORDER BY) (probably 0 disk hits, due to using MEMORY) Once the indexes are in place... Depending on the version you are using, key_buffer_size is limited to 4G or is not. For your huge machine, 24G might be wise. The key_buffer will help steps 1,3. The rest of RAM will be available for the OS to cache the data blocks (2,4). The 16 cores -- A single query (even with the UNION I suggested) will use only one core. How many rows (average, max) do you expect from SELECT ... FROM l4_link WHERE ruid1=xxx I ask, because that might run as slow as 100 rows/sec., simply because of the disk hits. I'm expecting an average of 5000 rows. PARTITIONing, per se, does not help performance. There are only a few use cases where PARTITION shines. (I have not seen such [yet] in your application.) -Original Message- From: Christian Koetteritzsch [mailto:ckoetteritz...@e-humanities.net] Sent: Wednesday, June 20, 2012 1:42 PM To: Rick James Cc: Ananda Kumar; mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries Thanks for the information. It is no problem if it takes days or weeks, because the server is specially for such tasks that takes time and uses lots of resources. Am 20.06.2012 19:55, schrieb Rick James: Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still take days, maybe weeks. I strongly recommend you do not try to access that many rows directly. Instead, build summary tables, and access them. We can discuss further. Did you mean that I make several partitions with for example 1 Billion Entries and than make a union of the results from the partitions? Unfortunately, you did not even include a PRIMARY KEY when you built the table. This makes any operations slow. There is no PRIMARY KEY because ruid1 and ruid2 are not unique values. I need an index for ruid1 and for ruid2. What will your SELECTs look like? You may be better off with INDEX(ruid1, ruid2, overlap), and INDEX(ruid2, ruid1, overlap) Will you be adding more rows to this table? Or is it now static? No data will be added to this table so it is a static table. And the index, as i have planed them will be INDEX(ruid1), and INDEX(ruid2). My SELECT looks like this: SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx and xxx are the same int values What are the semantics of the fields? The semantic behind these fields are, that ruid1 and ruid2 are ID's for sentences that have common words and overlap is the number of words they have in common. I hope the informations I gave you are helpful. If you have more questions than you can ask them. Kind regards Christian -Original Message- From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Wednesday, June 20, 2012 4:37 AM To: Christian Koetteritzsch Cc:mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries looks like the value that you give for myisam_max_sort_size is not enough for the index creation and hence it doing a REPAIR WITH KEYCACHE Use the below query to set the min values required for myisam_max_sort_size to avoid repair with keycache select a.index_name as index_name, ceil( (select count(*) from `db_name`.`table_name`) *( @@global.myisam_data_pointer_size + sum(ifnull(1 + `a`.`sub_part` * (`b`.`character_octet_length`/`b`.`character_maximum_length`), ifnull(1 + `b`.`character_octet_length`, case when `b`.`data_type` = 'tinyint' then 1 when `b`.`data_type` = 'smallint' then 2 when `b`.`data_type` = 'mediumint' then 3 when `b`.`data_type` = 'int' then 4 when `b`.`data_type` = 'bigint' then 8 when `b`.`data_type` = 'float' then 4
Re: Indexing about 40 Billion Entries
On 20/06/2012 11:45, Christian Koetteritzsch wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB Hiya I am unable to help. But one question that was running through my mind, as I was reading this is: How do you do your backups? I use mylvmbackup on a slave master replication server, but I would love to know how or what you use and do your backups. Hope you come right with your problem. Thanks Brent -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Indexing about 40 Billion Entries
At 02:04 AM 6/21/2012, you wrote: Thank you a lot. The first indexing process finished after about 13 hours, so I think the problem is solved now. I set the myisam_sort_bufffer_size to 10GB. The first indexing process??? You should have created all of your indexes with one Alter statement. Otherwise it will take another 13+ hours to build the second index because it has to create a copy of the table all over again. Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Indexing about 40 Billion Entries
Hi, thanks for the hint, I was testing the settings for the myisam_sort_buffer_size so I totally forgot it. But I have another three of these databases to do, so next time I do it with one ALTER statement Christian Am 21.06.2012 16:50, schrieb mos: At 02:04 AM 6/21/2012, you wrote: Thank you a lot. The first indexing process finished after about 13 hours, so I think the problem is solved now. I set the myisam_sort_bufffer_size to 10GB. The first indexing process??? You should have created all of your indexes with one Alter statement. Otherwise it will take another 13+ hours to build the second index because it has to create a copy of the table all over again. Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: Indexing about 40 Billion Entries
Possibly worse than that, since it will rebuild the 'first' index again. -Original Message- From: mos [mailto:mo...@fastmail.fm] Sent: Thursday, June 21, 2012 7:51 AM To: mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries At 02:04 AM 6/21/2012, you wrote: Thank you a lot. The first indexing process finished after about 13 hours, so I think the problem is solved now. I set the myisam_sort_bufffer_size to 10GB. The first indexing process??? You should have created all of your indexes with one Alter statement. Otherwise it will take another 13+ hours to build the second index because it has to create a copy of the table all over again. Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Indexing about 40 Billion Entries
Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. The status for this table is the following: Name: l4_link Engine: MyISAM Version: 10 Row_format: Fixed Rows: 39806500262 Avg_row_length: 17 Data_length: 676710504454 Max_data_length: 4785074604081151 Index_length: 1024 Data_free: 0 Auto_increment: NULL Create_time: 2012-06-19 14:51:29 Update_time: 2012-06-19 16:26:35 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: The variables for myisam are the following: mysql show global variables like '%myisam%'; ++---+ | Variable_name | Value | ++---+ | myisam_data_pointer_size | 6 | | myisam_max_sort_file_size | 9223372036853727232 | | myisam_mmap_size | 18446744073709551615| | myisam_recover_options | BACKUP | | myisam_repair_threads | 1 | | myisam_sort_buffer_size| 8388608 | | myisam_stats_method| nulls_unequal | | myisam_use_mmap | OFF | +---++ 8 rows in set (0.00 sec) The temp folder has about 16tb free space. When I start the indexing process, it copies the 640Gb into a temp file and then starts with repair with keycache. On the internet I found that if it says repair with keycache you shold increase the myisam_max_sort_file_size, but this didn't work. It still says repair with keycache after it copied the data to the temp dir. I hope you have any idea how to fix this. Thanks in advance. Christian -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Indexing about 40 Billion Entries
looks like the value that you give for myisam_max_sort_size is not enough for the index creation and hence it doing a REPAIR WITH KEYCACHE Use the below query to set the min values required for myisam_max_sort_size to avoid repair with keycache select a.index_name as index_name, ceil( (select count(*) from `db_name`.`table_name`) *( @@global.myisam_data_pointer_size + sum(ifnull(1 + `a`.`sub_part` * (`b`.`character_octet_length`/`b`.`character_maximum_length`), ifnull(1 + `b`.`character_octet_length`, case when `b`.`data_type` = 'tinyint' then 1 when `b`.`data_type` = 'smallint' then 2 when `b`.`data_type` = 'mediumint' then 3 when `b`.`data_type` = 'int' then 4 when `b`.`data_type` = 'bigint' then 8 when `b`.`data_type` = 'float' then 4 when `b`.`data_type` = 'double' then 8 when `b`.`data_type` = 'real' then 8 when `b`.`data_type` = 'bit' then 8 when `b`.`data_type` = 'date' then 3 when `b`.`data_type` = 'datetime' then 8 when `b`.`data_type` = 'timestamp' then 4 when `b`.`data_type` = 'time' then 3 when `b`.`data_type` = 'year' then 1 when `b`.`data_type` = 'enum' then 2 when `b`.`data_type` = 'set' then 8 when `b`.`data_type` = 'decimal' then 8 end ) ) + if(`a`.`nullable`='YES',1,0) + if(`b`.`character_octet_length` =255,2,0) ))/1048576)*1048576 as `index_size` from `information_schema`.`statistics` `a`, `information_schema`.`columns` `b` where `a`.`table_name`=`b`.`table_name` and `a`.`table_schema`=`b`.`table_schema` and `a`.`column_name`=`b`.`column_name` and `a`.`table_schema`='db_name' and `a`.`table_name`='table_name' group by `a`.`index_name` order by `index_size` desc limit 1; On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch ckoetteritz...@e-humanities.net wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. The status for this table is the following: Name: l4_link Engine: MyISAM Version: 10 Row_format: Fixed Rows: 39806500262 Avg_row_length: 17 Data_length: 676710504454 Max_data_length: 4785074604081151 Index_length: 1024 Data_free: 0 Auto_increment: NULL Create_time: 2012-06-19 14:51:29 Update_time: 2012-06-19 16:26:35 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: The variables for myisam are the following: mysql show global variables like '%myisam%'; +-**---+--** -+ | Variable_name | Value | +-**---+--** -+ | myisam_data_pointer_size | 6 | | myisam_max_sort_file_size | 9223372036853727232 | | myisam_mmap_size | 18446744073709551615| | myisam_recover_options | BACKUP | | myisam_repair_threads | 1 | | myisam_sort_buffer_size| 8388608 | | myisam_stats_method| nulls_unequal | | myisam_use_mmap | OFF | +-**--+---** -+ 8 rows in set (0.00 sec) The temp folder has about 16tb free space. When I start the indexing process, it copies the 640Gb into a temp file and then starts with repair with keycache. On the internet I found that if it says repair with keycache you shold increase the myisam_max_sort_file_size, but this didn't work. It still says repair with keycache after it copied the data to the temp dir. I hope you have any idea how to fix this. Thanks in advance. Christian -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: Indexing about 40 Billion Entries
Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still take days, maybe weeks. I strongly recommend you do not try to access that many rows directly. Instead, build summary tables, and access them. We can discuss further. Unfortunately, you did not even include a PRIMARY KEY when you built the table. This makes any operations slow. I need an index for ruid1 and for ruid2. What will your SELECTs look like? You may be better off with INDEX(ruid1, ruid2, overlap), and INDEX(ruid2, ruid1, overlap) Will you be adding more rows to this table? Or is it now static? What are the semantics of the fields? -Original Message- From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Wednesday, June 20, 2012 4:37 AM To: Christian Koetteritzsch Cc: mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries looks like the value that you give for myisam_max_sort_size is not enough for the index creation and hence it doing a REPAIR WITH KEYCACHE Use the below query to set the min values required for myisam_max_sort_size to avoid repair with keycache select a.index_name as index_name, ceil( (select count(*) from `db_name`.`table_name`) *( @@global.myisam_data_pointer_size + sum(ifnull(1 + `a`.`sub_part` * (`b`.`character_octet_length`/`b`.`character_maximum_length`), ifnull(1 + `b`.`character_octet_length`, case when `b`.`data_type` = 'tinyint' then 1 when `b`.`data_type` = 'smallint' then 2 when `b`.`data_type` = 'mediumint' then 3 when `b`.`data_type` = 'int' then 4 when `b`.`data_type` = 'bigint' then 8 when `b`.`data_type` = 'float' then 4 when `b`.`data_type` = 'double' then 8 when `b`.`data_type` = 'real' then 8 when `b`.`data_type` = 'bit' then 8 when `b`.`data_type` = 'date' then 3 when `b`.`data_type` = 'datetime' then 8 when `b`.`data_type` = 'timestamp' then 4 when `b`.`data_type` = 'time' then 3 when `b`.`data_type` = 'year' then 1 when `b`.`data_type` = 'enum' then 2 when `b`.`data_type` = 'set' then 8 when `b`.`data_type` = 'decimal' then 8 end ) ) + if(`a`.`nullable`='YES',1,0) + if(`b`.`character_octet_length` =255,2,0) ))/1048576)*1048576 as `index_size` from `information_schema`.`statistics` `a`, `information_schema`.`columns` `b` where `a`.`table_name`=`b`.`table_name` and `a`.`table_schema`=`b`.`table_schema` and `a`.`column_name`=`b`.`column_name` and `a`.`table_schema`='db_name' and `a`.`table_name`='table_name' group by `a`.`index_name` order by `index_size` desc limit 1; On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch ckoetteritz...@e-humanities.net wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. The status for this table is the following: Name: l4_link Engine: MyISAM Version: 10 Row_format: Fixed Rows: 39806500262 Avg_row_length: 17 Data_length: 676710504454 Max_data_length: 4785074604081151 Index_length: 1024 Data_free: 0 Auto_increment: NULL Create_time: 2012-06-19 14:51:29 Update_time: 2012-06-19 16:26:35 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: The variables for myisam are the following: mysql show global variables like '%myisam%'; +-**---+--** -+ | Variable_name | Value | +-**---+--** -+ | myisam_data_pointer_size | 6 | | myisam_max_sort_file_size | 9223372036853727232 | | myisam_mmap_size | 18446744073709551615| | myisam_recover_options | BACKUP | | myisam_repair_threads | 1 | | myisam_sort_buffer_size| 8388608 | | myisam_stats_method
Re: Indexing about 40 Billion Entries
Thanks for the information. It is no problem if it takes days or weeks, because the server is specially for such tasks that takes time and uses lots of resources. Am 20.06.2012 19:55, schrieb Rick James: Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still take days, maybe weeks. I strongly recommend you do not try to access that many rows directly. Instead, build summary tables, and access them. We can discuss further. Did you mean that I make several partitions with for example 1 Billion Entries and than make a union of the results from the partitions? Unfortunately, you did not even include a PRIMARY KEY when you built the table. This makes any operations slow. There is no PRIMARY KEY because ruid1 and ruid2 are not unique values. I need an index for ruid1 and for ruid2. What will your SELECTs look like? You may be better off with INDEX(ruid1, ruid2, overlap), and INDEX(ruid2, ruid1, overlap) Will you be adding more rows to this table? Or is it now static? No data will be added to this table so it is a static table. And the index, as i have planed them will be INDEX(ruid1), and INDEX(ruid2). My SELECT looks like this: SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx and xxx are the same int values What are the semantics of the fields? The semantic behind these fields are, that ruid1 and ruid2 are ID's for sentences that have common words and overlap is the number of words they have in common. I hope the informations I gave you are helpful. If you have more questions than you can ask them. Kind regards Christian -Original Message- From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Wednesday, June 20, 2012 4:37 AM To: Christian Koetteritzsch Cc:mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries looks like the value that you give for myisam_max_sort_size is not enough for the index creation and hence it doing a REPAIR WITH KEYCACHE Use the below query to set the min values required for myisam_max_sort_size to avoid repair with keycache select a.index_name as index_name, ceil( (select count(*) from `db_name`.`table_name`) *( @@global.myisam_data_pointer_size + sum(ifnull(1 + `a`.`sub_part` * (`b`.`character_octet_length`/`b`.`character_maximum_length`), ifnull(1 + `b`.`character_octet_length`, case when `b`.`data_type` = 'tinyint' then 1 when `b`.`data_type` = 'smallint' then 2 when `b`.`data_type` = 'mediumint' then 3 when `b`.`data_type` = 'int' then 4 when `b`.`data_type` = 'bigint' then 8 when `b`.`data_type` = 'float' then 4 when `b`.`data_type` = 'double' then 8 when `b`.`data_type` = 'real' then 8 when `b`.`data_type` = 'bit' then 8 when `b`.`data_type` = 'date' then 3 when `b`.`data_type` = 'datetime' then 8 when `b`.`data_type` = 'timestamp' then 4 when `b`.`data_type` = 'time' then 3 when `b`.`data_type` = 'year' then 1 when `b`.`data_type` = 'enum' then 2 when `b`.`data_type` = 'set' then 8 when `b`.`data_type` = 'decimal' then 8 end ) ) + if(`a`.`nullable`='YES',1,0) + if(`b`.`character_octet_length` =255,2,0) ))/1048576)*1048576 as `index_size` from `information_schema`.`statistics` `a`, `information_schema`.`columns` `b` where `a`.`table_name`=`b`.`table_name` and `a`.`table_schema`=`b`.`table_schema` and `a`.`column_name`=`b`.`column_name` and `a`.`table_schema`='db_name' and `a`.`table_name`='table_name' group by `a`.`index_name` order by `index_size` desc limit 1; On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch ckoetteritz...@e-humanities.net wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. The status for this table is the following: Name: l4_link Engine: MyISAM Version: 10 Row_format: Fixed Rows: 39806500262 Avg_row_length: 17 Data_length: 676710504454 Max_data_length: 4785074604081151 Index_length: 1024
Re: Indexing about 40 Billion Entries
sorry I forget the ORDER BY in the SELECT statement so the correct SELECT statement is: SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx ORDER BY overlap DESC Am 20.06.2012 22:42, schrieb Christian Koetteritzsch: Thanks for the information. It is no problem if it takes days or weeks, because the server is specially for such tasks that takes time and uses lots of resources. Am 20.06.2012 19:55, schrieb Rick James: Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still take days, maybe weeks. I strongly recommend you do not try to access that many rows directly. Instead, build summary tables, and access them. We can discuss further. Did you mean that I make several partitions with for example 1 Billion Entries and than make a union of the results from the partitions? Unfortunately, you did not even include a PRIMARY KEY when you built the table. This makes any operations slow. There is no PRIMARY KEY because ruid1 and ruid2 are not unique values. I need an index for ruid1 and for ruid2. What will your SELECTs look like? You may be better off with INDEX(ruid1, ruid2, overlap), and INDEX(ruid2, ruid1, overlap) Will you be adding more rows to this table? Or is it now static? No data will be added to this table so it is a static table. And the index, as i have planed them will be INDEX(ruid1), and INDEX(ruid2). My SELECT looks like this: SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx and xxx are the same int values What are the semantics of the fields? The semantic behind these fields are, that ruid1 and ruid2 are ID's for sentences that have common words and overlap is the number of words they have in common. I hope the informations I gave you are helpful. If you have more questions than you can ask them. Kind regards Christian -Original Message- From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Wednesday, June 20, 2012 4:37 AM To: Christian Koetteritzsch Cc:mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries looks like the value that you give for myisam_max_sort_size is not enough for the index creation and hence it doing a REPAIR WITH KEYCACHE Use the below query to set the min values required for myisam_max_sort_size to avoid repair with keycache select a.index_name as index_name, ceil( (select count(*) from `db_name`.`table_name`) *( @@global.myisam_data_pointer_size + sum(ifnull(1 + `a`.`sub_part` * (`b`.`character_octet_length`/`b`.`character_maximum_length`), ifnull(1 + `b`.`character_octet_length`, case when `b`.`data_type` = 'tinyint' then 1 when `b`.`data_type` = 'smallint' then 2 when `b`.`data_type` = 'mediumint' then 3 when `b`.`data_type` = 'int' then 4 when `b`.`data_type` = 'bigint' then 8 when `b`.`data_type` = 'float' then 4 when `b`.`data_type` = 'double' then 8 when `b`.`data_type` = 'real' then 8 when `b`.`data_type` = 'bit' then 8 when `b`.`data_type` = 'date' then 3 when `b`.`data_type` = 'datetime' then 8 when `b`.`data_type` = 'timestamp' then 4 when `b`.`data_type` = 'time' then 3 when `b`.`data_type` = 'year' then 1 when `b`.`data_type` = 'enum' then 2 when `b`.`data_type` = 'set' then 8 when `b`.`data_type` = 'decimal' then 8 end ) ) + if(`a`.`nullable`='YES',1,0) + if(`b`.`character_octet_length` =255,2,0) ))/1048576)*1048576 as `index_size` from `information_schema`.`statistics` `a`, `information_schema`.`columns` `b` where `a`.`table_name`=`b`.`table_name` and `a`.`table_schema`=`b`.`table_schema` and `a`.`column_name`=`b`.`column_name` and `a`.`table_schema`='db_name' and `a`.`table_name`='table_name' group by `a`.`index_name` order by `index_size` desc limit 1; On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch ckoetteritz...@e-humanities.net wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. The status for this table
Re: Indexing about 40 Billion Entries
On 6/20/2012 5:45 AM, Christian Koetteritzsch wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. Actually, based on your proposed query, I believe you want an index on (ruid1, ruid2) not separate indexes for each column. The status for this table is the following: Name: l4_link Engine: MyISAM Version: 10 Row_format: Fixed Rows: 39806500262 Avg_row_length: 17 Data_length: 676710504454 Max_data_length: 4785074604081151 Index_length: 1024 Data_free: 0 Auto_increment: NULL Create_time: 2012-06-19 14:51:29 Update_time: 2012-06-19 16:26:35 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: The variables for myisam are the following: mysql show global variables like '%myisam%'; ++---+ | Variable_name | Value | ++---+ | myisam_data_pointer_size | 6 | | myisam_max_sort_file_size | 9223372036853727232 | | myisam_mmap_size | 18446744073709551615| | myisam_recover_options | BACKUP | | myisam_repair_threads | 1 | | myisam_sort_buffer_size| 8388608 | | myisam_stats_method| nulls_unequal | | myisam_use_mmap | OFF | +---++ 8 rows in set (0.00 sec) The temp folder has about 16tb free space. When I start the indexing process, it copies the 640Gb into a temp file and then starts with repair with keycache. On the internet I found that if it says repair with keycache you shold increase the myisam_max_sort_file_size, but this didn't work. It still says repair with keycache after it copied the data to the temp dir. I hope you have any idea how to fix this. Try maxing out the following settings: myisam_sort_buffer_size key_buffer_size http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_myisam_sort_buffer_size http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_myisam_sort_buffer_size You will need to experiment with how large the maximum value will be permitted for your platform. -- Shawn Green MySQL Principal Technical Support Engineer Oracle USA, Inc. - Hardware and Software, Engineered to Work Together. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: Indexing about 40 Billion Entries
(ruid1, ruid2) will help for AND, but not at all for OR. -Original Message- From: Shawn Green [mailto:shawn.l.gr...@oracle.com] Sent: Wednesday, June 20, 2012 2:30 PM To: mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries On 6/20/2012 5:45 AM, Christian Koetteritzsch wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. Actually, based on your proposed query, I believe you want an index on (ruid1, ruid2) not separate indexes for each column. The status for this table is the following: Name: l4_link Engine: MyISAM Version: 10 Row_format: Fixed Rows: 39806500262 Avg_row_length: 17 Data_length: 676710504454 Max_data_length: 4785074604081151 Index_length: 1024 Data_free: 0 Auto_increment: NULL Create_time: 2012-06-19 14:51:29 Update_time: 2012-06-19 16:26:35 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: The variables for myisam are the following: mysql show global variables like '%myisam%'; ++--- + | Variable_name | Value | ++--- + | myisam_data_pointer_size | 6 | | myisam_max_sort_file_size | 9223372036853727232 | | myisam_mmap_size | 18446744073709551615| | myisam_recover_options | BACKUP | | myisam_repair_threads | 1 | | myisam_sort_buffer_size| 8388608 | | myisam_stats_method| nulls_unequal | | myisam_use_mmap | OFF | +---+ + 8 rows in set (0.00 sec) The temp folder has about 16tb free space. When I start the indexing process, it copies the 640Gb into a temp file and then starts with repair with keycache. On the internet I found that if it says repair with keycache you shold increase the myisam_max_sort_file_size, but this didn't work. It still says repair with keycache after it copied the data to the temp dir. I hope you have any idea how to fix this. Try maxing out the following settings: myisam_sort_buffer_size key_buffer_size http://dev.mysql.com/doc/refman/5.5/en/server-system- variables.html#sysvar_myisam_sort_buffer_size http://dev.mysql.com/doc/refman/5.5/en/server-system- variables.html#sysvar_myisam_sort_buffer_size You will need to experiment with how large the maximum value will be permitted for your platform. -- Shawn Green MySQL Principal Technical Support Engineer Oracle USA, Inc. - Hardware and Software, Engineered to Work Together. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: Indexing about 40 Billion Entries
SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx ORDER BY overlap DESC MySQL does not optimize that kind of OR well. This will run _much_ faster (with your two indexes): ( SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx UNION SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx ) ORDER BY overlap DESC; Make it UNION ALL or UNION DISTINCT depending on whether xxx can be in both fields of one row. UNION DISTINCT makes a pass over the temp table to dedup. Your version (OR) will do a table scan. My version will: 1. scan ruid1 index (.MYI) (1+ disk hits for index range scan) 2. for each matching ruid1, fetch the data row (in .MYD) (1 hit/row); write to tmp table 3,4. ditto for ruid2, appending to same tmp table 5. sort tmp table (ORDER BY) (probably 0 disk hits, due to using MEMORY) Once the indexes are in place... Depending on the version you are using, key_buffer_size is limited to 4G or is not. For your huge machine, 24G might be wise. The key_buffer will help steps 1,3. The rest of RAM will be available for the OS to cache the data blocks (2,4). The 16 cores -- A single query (even with the UNION I suggested) will use only one core. How many rows (average, max) do you expect from SELECT ... FROM l4_link WHERE ruid1=xxx I ask, because that might run as slow as 100 rows/sec., simply because of the disk hits. PARTITIONing, per se, does not help performance. There are only a few use cases where PARTITION shines. (I have not seen such [yet] in your application.) -Original Message- From: Christian Koetteritzsch [mailto:ckoetteritz...@e-humanities.net] Sent: Wednesday, June 20, 2012 1:42 PM To: Rick James Cc: Ananda Kumar; mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries Thanks for the information. It is no problem if it takes days or weeks, because the server is specially for such tasks that takes time and uses lots of resources. Am 20.06.2012 19:55, schrieb Rick James: Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still take days, maybe weeks. I strongly recommend you do not try to access that many rows directly. Instead, build summary tables, and access them. We can discuss further. Did you mean that I make several partitions with for example 1 Billion Entries and than make a union of the results from the partitions? Unfortunately, you did not even include a PRIMARY KEY when you built the table. This makes any operations slow. There is no PRIMARY KEY because ruid1 and ruid2 are not unique values. I need an index for ruid1 and for ruid2. What will your SELECTs look like? You may be better off with INDEX(ruid1, ruid2, overlap), and INDEX(ruid2, ruid1, overlap) Will you be adding more rows to this table? Or is it now static? No data will be added to this table so it is a static table. And the index, as i have planed them will be INDEX(ruid1), and INDEX(ruid2). My SELECT looks like this: SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx and xxx are the same int values What are the semantics of the fields? The semantic behind these fields are, that ruid1 and ruid2 are ID's for sentences that have common words and overlap is the number of words they have in common. I hope the informations I gave you are helpful. If you have more questions than you can ask them. Kind regards Christian -Original Message- From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Wednesday, June 20, 2012 4:37 AM To: Christian Koetteritzsch Cc:mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries looks like the value that you give for myisam_max_sort_size is not enough for the index creation and hence it doing a REPAIR WITH KEYCACHE Use the below query to set the min values required for myisam_max_sort_size to avoid repair with keycache select a.index_name as index_name, ceil( (select count(*) from `db_name`.`table_name`) *( @@global.myisam_data_pointer_size + sum(ifnull(1 + `a`.`sub_part` * (`b`.`character_octet_length`/`b`.`character_maximum_length`), ifnull(1 + `b`.`character_octet_length`, case when `b`.`data_type` = 'tinyint' then 1 when `b`.`data_type` = 'smallint' then 2 when `b`.`data_type` = 'mediumint' then 3 when `b`.`data_type` = 'int' then 4 when `b`.`data_type` = 'bigint' then 8 when `b`.`data_type` = 'float' then 4 when `b`.`data_type` = 'double' then 8 when `b`.`data_type` = 'real' then 8 when `b`.`data_type` = 'bit' then 8
Re: Indexing question
Thanks Gavin and Joerg, that was very helpful! -- Jonas On Sun, Oct 3, 2010 at 12:44 PM, Joerg Bruehe joerg.bru...@oracle.comwrote: Hi Neil, all! Tompkins Neil wrote: So if you have individual indexes for example field_1, field_2 and field_3 etc and then perform a search like WHERE field_1 = 10 AND field_3 = 'abc' This wouldn't improve the search ? You have to create a index for all possible combined field searches ? No - you didn't read Gavin's mail exact enough: On Fri, Oct 1, 2010 at 9:35 PM, Gavin Towey gto...@ffn.com wrote: [[...]] Additionally indexes are always read left to right. So an index on ('user_id', 'product_id') will help when doing WHERE user_id=N AND product_id IN (1,2,3), but wouldn't help for just the condtion on product_id. What Gavin calls left to right is what I call most significant first, the result is the same: In a multi-column index, the columns are listed in the order of their significance. Any DBMS (this is not limited to MySQL) can use such an index only if a condition for the first (= most significant) field(s) is (are) specified. Example: Assume the index is on fields A, B, and C in that order. A statement ... where A = x and B = y and C = z can use the index. A statement ... where A = x and B = y can use the index, limited to the first two fields. A statement ... where A = x can use the index. the first field only. A statement ... where A = x and C = z can also use the index for A, but will have to evaluate the condition on C by scanning all records matching A. A statement ... where B = y and C = z cannot use the index, because there is no condition on A. If there are many searches based on A and C only (not B), and there are many records matching A with different values of C, then an additional index on these two columns may be helpful. Compare the index with a phone book, which (typically) lists the entries sorted by last name (most significant), then first name, then ... : If you don't know the last name, you cannot profit from the sorting and have to scan the wole book. See the manual for full details on how mysql uses indexes: http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html HTH, Jörg -- Joerg Bruehe, MySQL Build Team, joerg.bru...@oracle.com ORACLE Deutschland B.V. Co. KG, Komturstrasse 18a, D-12099 Berlin Geschaeftsfuehrer: Juergen Kunz, Marcel v.d. Molen, Alexander v.d. Ven Amtsgericht Muenchen: HRA 95603 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=jonasgal...@gmail.com
Re: Indexing question
Hi Find attached the first of my queries, I require to be checked over. Let me know if you need any more information, I've included the query, EXPLAIN output, and the relavant SHOW CREATE TABLE Regards Neil On Mon, Oct 4, 2010 at 9:50 PM, Neil Tompkins neil.tompk...@googlemail.comwrote: I've got a fair few number of queries to be checked over. Will send them tommorrow On 4 Oct 2010, at 18:27, Gavin Towey gto...@ffn.com wrote: Include the query, EXPLAIN output, and the relavant SHOW CREATE TABLE table \G output. Someone should be able to offer suggestions. -Original Message- From: Tompkins Neil [mailto:neil.tompk...@googlemail.com] Sent: Monday, October 04, 2010 8:54 AM To: Joerg Bruehe Cc: [MySQL] Subject: Re: Indexing question Jörg Thanks for the useful reply. Maybe I can EXPLAIN my select queries for you to advise if any changes need to be made ? Regards Neil On Mon, Oct 4, 2010 at 3:38 PM, Joerg Bruehe joerg.bru...@oracle.com wrote: Hi! Neil Tompkins wrote: Thanks for your reply. So should we create individual indexes on each field or a multiple column index ?? This question cannot be answered without checking and measuring your installation. The decision whether to create an index is always an act of balancing: - If there is an index, the database server can use it to find data records by looking up the index, not scanning the base data. This results in load reduction (both CPU and disk IO) and speeds up query execution. - If there is an index, the database server must maintain it whenever data are altered (insert/update/delete), in addition to the base data. This is increased load (both CPU and disk IO) and slows down data changes. So obviously you want to create only those indexes that are helpful for query execution: you will never (voluntarily) create an index on a column which isn't used in search conditions, or whose use is already provided by other indexes. Of the remaining candidate indexes, you will never (voluntarily) create one that provides less gain in searches than it costs in data changes. With MySQL, AFAIK there is the limitation that on one table only one index can be used. As a result, the choice of indexes to create depends on the searches executed by your commands, their relative frequency, and the frequency of data changes. To answer your other question: If you run aggregate functions (like SUM(), MIN(), or MAX()) on all records of a table, their results could be computed by accessing a matching index only. I don't know whether MySQL does this, I propose you check that yourself using EXPLAIN. If you run them on subsets of a table only, an index on that column will not help in general. In database implementations, there is the concept of a covering index: If you have an index on columns A and B of some table, its contents (without the base data) would suffice to answer SELECT SUM(B) WHERE A = x Again, I don't know whether MySQL does this, and I refer you to EXPLAIN. HTH, Jörg -- Joerg Bruehe, MySQL Build Team, joerg.bru...@oracle.com ORACLE Deutschland B.V. Co. KG, Komturstrasse 18a, D-12099 Berlin Geschaeftsfuehrer: Juergen Kunz, Marcel v.d. Molen, Alexander v.d. Ven Amtsgericht Muenchen: HRA 95603 This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you are notified that reviewing, disseminating, disclosing, copying or distributing this e-mail is strictly prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any loss or damage caused by viruses or errors or omissions in the contents of this message, which arise as a result of e-mail transmission. [FriendFinder Networks, Inc., 220 Humbolt court, Sunnyvale, CA 94089, USA, FriendFinder.com 'players_bids', 'CREATE TABLE `players_bids` ( `players_bids_id` bigint(20) NOT NULL auto_increment, `worlds_id` int(10) NOT NULL, `seasons_id` int(10) NOT NULL, `teams_id_from` bigint(20) NOT NULL, `teams_id_to` bigint(20) NOT NULL, `users_id_from` bigint(20) NOT NULL, `users_id_to` bigint(20) NOT NULL, `players_id` bigint(20) NOT NULL, `bid_value` double NOT NULL default ''0'', `bid_type` enum(''transfer'',''loan'',''exchange'') collate utf8_unicode_ci NOT NULL, `bid_status` varchar(45) collate utf8_unicode_ci NOT NULL, `bid_date` datetime NOT NULL, `bid_completed_date` datetime default NULL, PRIMARY KEY (`players_bids_id`), KEY `FK_players_bids_worlds_id` (`worlds_id`), KEY `IDX_bid_date` (`bid_date`), KEY `IDX_bid_status` (`bid_status`), KEY `IDX_bid_type` (`bid_type`), KEY
Re: Indexing question
Hi! Neil Tompkins wrote: Thanks for your reply. So should we create individual indexes on each field or a multiple column index ?? This question cannot be answered without checking and measuring your installation. The decision whether to create an index is always an act of balancing: - If there is an index, the database server can use it to find data records by looking up the index, not scanning the base data. This results in load reduction (both CPU and disk IO) and speeds up query execution. - If there is an index, the database server must maintain it whenever data are altered (insert/update/delete), in addition to the base data. This is increased load (both CPU and disk IO) and slows down data changes. So obviously you want to create only those indexes that are helpful for query execution: you will never (voluntarily) create an index on a column which isn't used in search conditions, or whose use is already provided by other indexes. Of the remaining candidate indexes, you will never (voluntarily) create one that provides less gain in searches than it costs in data changes. With MySQL, AFAIK there is the limitation that on one table only one index can be used. As a result, the choice of indexes to create depends on the searches executed by your commands, their relative frequency, and the frequency of data changes. To answer your other question: If you run aggregate functions (like SUM(), MIN(), or MAX()) on all records of a table, their results could be computed by accessing a matching index only. I don't know whether MySQL does this, I propose you check that yourself using EXPLAIN. If you run them on subsets of a table only, an index on that column will not help in general. In database implementations, there is the concept of a covering index: If you have an index on columns A and B of some table, its contents (without the base data) would suffice to answer SELECT SUM(B) WHERE A = x Again, I don't know whether MySQL does this, and I refer you to EXPLAIN. HTH, Jörg -- Joerg Bruehe, MySQL Build Team, joerg.bru...@oracle.com ORACLE Deutschland B.V. Co. KG, Komturstrasse 18a, D-12099 Berlin Geschaeftsfuehrer: Juergen Kunz, Marcel v.d. Molen, Alexander v.d. Ven Amtsgericht Muenchen: HRA 95603 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Indexing question
Jörg Thanks for the useful reply. Maybe I can EXPLAIN my select queries for you to advise if any changes need to be made ? Regards Neil On Mon, Oct 4, 2010 at 3:38 PM, Joerg Bruehe joerg.bru...@oracle.comwrote: Hi! Neil Tompkins wrote: Thanks for your reply. So should we create individual indexes on each field or a multiple column index ?? This question cannot be answered without checking and measuring your installation. The decision whether to create an index is always an act of balancing: - If there is an index, the database server can use it to find data records by looking up the index, not scanning the base data. This results in load reduction (both CPU and disk IO) and speeds up query execution. - If there is an index, the database server must maintain it whenever data are altered (insert/update/delete), in addition to the base data. This is increased load (both CPU and disk IO) and slows down data changes. So obviously you want to create only those indexes that are helpful for query execution: you will never (voluntarily) create an index on a column which isn't used in search conditions, or whose use is already provided by other indexes. Of the remaining candidate indexes, you will never (voluntarily) create one that provides less gain in searches than it costs in data changes. With MySQL, AFAIK there is the limitation that on one table only one index can be used. As a result, the choice of indexes to create depends on the searches executed by your commands, their relative frequency, and the frequency of data changes. To answer your other question: If you run aggregate functions (like SUM(), MIN(), or MAX()) on all records of a table, their results could be computed by accessing a matching index only. I don't know whether MySQL does this, I propose you check that yourself using EXPLAIN. If you run them on subsets of a table only, an index on that column will not help in general. In database implementations, there is the concept of a covering index: If you have an index on columns A and B of some table, its contents (without the base data) would suffice to answer SELECT SUM(B) WHERE A = x Again, I don't know whether MySQL does this, and I refer you to EXPLAIN. HTH, Jörg -- Joerg Bruehe, MySQL Build Team, joerg.bru...@oracle.com ORACLE Deutschland B.V. Co. KG, Komturstrasse 18a, D-12099 Berlin Geschaeftsfuehrer: Juergen Kunz, Marcel v.d. Molen, Alexander v.d. Ven Amtsgericht Muenchen: HRA 95603
RE: Indexing question
Include the query, EXPLAIN output, and the relavant SHOW CREATE TABLE table \G output. Someone should be able to offer suggestions. -Original Message- From: Tompkins Neil [mailto:neil.tompk...@googlemail.com] Sent: Monday, October 04, 2010 8:54 AM To: Joerg Bruehe Cc: [MySQL] Subject: Re: Indexing question Jörg Thanks for the useful reply. Maybe I can EXPLAIN my select queries for you to advise if any changes need to be made ? Regards Neil On Mon, Oct 4, 2010 at 3:38 PM, Joerg Bruehe joerg.bru...@oracle.comwrote: Hi! Neil Tompkins wrote: Thanks for your reply. So should we create individual indexes on each field or a multiple column index ?? This question cannot be answered without checking and measuring your installation. The decision whether to create an index is always an act of balancing: - If there is an index, the database server can use it to find data records by looking up the index, not scanning the base data. This results in load reduction (both CPU and disk IO) and speeds up query execution. - If there is an index, the database server must maintain it whenever data are altered (insert/update/delete), in addition to the base data. This is increased load (both CPU and disk IO) and slows down data changes. So obviously you want to create only those indexes that are helpful for query execution: you will never (voluntarily) create an index on a column which isn't used in search conditions, or whose use is already provided by other indexes. Of the remaining candidate indexes, you will never (voluntarily) create one that provides less gain in searches than it costs in data changes. With MySQL, AFAIK there is the limitation that on one table only one index can be used. As a result, the choice of indexes to create depends on the searches executed by your commands, their relative frequency, and the frequency of data changes. To answer your other question: If you run aggregate functions (like SUM(), MIN(), or MAX()) on all records of a table, their results could be computed by accessing a matching index only. I don't know whether MySQL does this, I propose you check that yourself using EXPLAIN. If you run them on subsets of a table only, an index on that column will not help in general. In database implementations, there is the concept of a covering index: If you have an index on columns A and B of some table, its contents (without the base data) would suffice to answer SELECT SUM(B) WHERE A = x Again, I don't know whether MySQL does this, and I refer you to EXPLAIN. HTH, Jörg -- Joerg Bruehe, MySQL Build Team, joerg.bru...@oracle.com ORACLE Deutschland B.V. Co. KG, Komturstrasse 18a, D-12099 Berlin Geschaeftsfuehrer: Juergen Kunz, Marcel v.d. Molen, Alexander v.d. Ven Amtsgericht Muenchen: HRA 95603 This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you are notified that reviewing, disseminating, disclosing, copying or distributing this e-mail is strictly prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any loss or damage caused by viruses or errors or omissions in the contents of this message, which arise as a result of e-mail transmission. [FriendFinder Networks, Inc., 220 Humbolt court, Sunnyvale, CA 94089, USA, FriendFinder.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Indexing question
I've got a fair few number of queries to be checked over. Will send them tommorrow On 4 Oct 2010, at 18:27, Gavin Towey gto...@ffn.com wrote: Include the query, EXPLAIN output, and the relavant SHOW CREATE TABLE table \G output. Someone should be able to offer suggestions. -Original Message- From: Tompkins Neil [mailto:neil.tompk...@googlemail.com] Sent: Monday, October 04, 2010 8:54 AM To: Joerg Bruehe Cc: [MySQL] Subject: Re: Indexing question Jörg Thanks for the useful reply. Maybe I can EXPLAIN my select queries for you to advise if any changes need to be made ? Regards Neil On Mon, Oct 4, 2010 at 3:38 PM, Joerg Bruehe joerg.bru...@oracle.comwrote: Hi! Neil Tompkins wrote: Thanks for your reply. So should we create individual indexes on each field or a multiple column index ?? This question cannot be answered without checking and measuring your installation. The decision whether to create an index is always an act of balancing: - If there is an index, the database server can use it to find data records by looking up the index, not scanning the base data. This results in load reduction (both CPU and disk IO) and speeds up query execution. - If there is an index, the database server must maintain it whenever data are altered (insert/update/delete), in addition to the base data. This is increased load (both CPU and disk IO) and slows down data changes. So obviously you want to create only those indexes that are helpful for query execution: you will never (voluntarily) create an index on a column which isn't used in search conditions, or whose use is already provided by other indexes. Of the remaining candidate indexes, you will never (voluntarily) create one that provides less gain in searches than it costs in data changes. With MySQL, AFAIK there is the limitation that on one table only one index can be used. As a result, the choice of indexes to create depends on the searches executed by your commands, their relative frequency, and the frequency of data changes. To answer your other question: If you run aggregate functions (like SUM(), MIN(), or MAX()) on all records of a table, their results could be computed by accessing a matching index only. I don't know whether MySQL does this, I propose you check that yourself using EXPLAIN. If you run them on subsets of a table only, an index on that column will not help in general. In database implementations, there is the concept of a covering index: If you have an index on columns A and B of some table, its contents (without the base data) would suffice to answer SELECT SUM(B) WHERE A = x Again, I don't know whether MySQL does this, and I refer you to EXPLAIN. HTH, Jörg -- Joerg Bruehe, MySQL Build Team, joerg.bru...@oracle.com ORACLE Deutschland B.V. Co. KG, Komturstrasse 18a, D-12099 Berlin Geschaeftsfuehrer: Juergen Kunz, Marcel v.d. Molen, Alexander v.d. Ven Amtsgericht Muenchen: HRA 95603 This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you are notified that reviewing, disseminating, disclosing, copying or distributing this e-mail is strictly prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any loss or damage caused by viruses or errors or omissions in the contents of this message, which arise as a result of e-mail transmission. [FriendFinder Networks, Inc., 220 Humbolt court, Sunnyvale, CA 94089, USA, FriendFinder.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Indexing question
So if you have individual indexes for example field_1, field_2 and field_3 etc and then perform a search like WHERE field_1 = 10 AND field_3 = 'abc' This wouldn't improve the search ? You have to create a index for all possible combined field searches ? On Fri, Oct 1, 2010 at 9:35 PM, Gavin Towey gto...@ffn.com wrote: You can't use an index to select records in a range, and order them. The order by will cause a filesort in that case. Additionally indexes are always read left to right. So an index on ('user_id', 'product_id') will help when doing WHERE user_id=N AND product_id IN (1,2,3), but wouldn't help for just the condtion on product_id. See the manual for full details on how mysql uses indexes: http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html -Original Message- From: Jonas Galvez [mailto:jonasgal...@gmail.com] Sent: Friday, October 01, 2010 11:48 AM To: mysql@lists.mysql.com Subject: Indexing question Suppose I wanted to be able to perform queries against three columns of my table: 'user_id', 'product_id' and 'created'. Most of the time I'll just be range-selecting records from the table ordering by 'created'. But I may also want to select where 'user_id' = something and 'product_id' in (list, of, ids), ordered by 'created'. Do I need two separate indexes, one on 'created' and another on ('user_id', 'product_id', 'created'), or does having only the latter suffice the former case? -- Jonas, http://jonasgalvez.com.br This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you are notified that reviewing, disseminating, disclosing, copying or distributing this e-mail is strictly prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any loss or damage caused by viruses or errors or omissions in the contents of this message, which arise as a result of e-mail transmission. [FriendFinder Networks, Inc., 220 Humbolt court, Sunnyvale, CA 94089, USA, FriendFinder.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=neil.tompk...@googlemail.com
Re: Indexing question
Hi Neil, all! Tompkins Neil wrote: So if you have individual indexes for example field_1, field_2 and field_3 etc and then perform a search like WHERE field_1 = 10 AND field_3 = 'abc' This wouldn't improve the search ? You have to create a index for all possible combined field searches ? No - you didn't read Gavin's mail exact enough: On Fri, Oct 1, 2010 at 9:35 PM, Gavin Towey gto...@ffn.com wrote: [[...]] Additionally indexes are always read left to right. So an index on ('user_id', 'product_id') will help when doing WHERE user_id=N AND product_id IN (1,2,3), but wouldn't help for just the condtion on product_id. What Gavin calls left to right is what I call most significant first, the result is the same: In a multi-column index, the columns are listed in the order of their significance. Any DBMS (this is not limited to MySQL) can use such an index only if a condition for the first (= most significant) field(s) is (are) specified. Example: Assume the index is on fields A, B, and C in that order. A statement ... where A = x and B = y and C = z can use the index. A statement ... where A = x and B = y can use the index, limited to the first two fields. A statement ... where A = x can use the index. the first field only. A statement ... where A = x and C = z can also use the index for A, but will have to evaluate the condition on C by scanning all records matching A. A statement ... where B = y and C = z cannot use the index, because there is no condition on A. If there are many searches based on A and C only (not B), and there are many records matching A with different values of C, then an additional index on these two columns may be helpful. Compare the index with a phone book, which (typically) lists the entries sorted by last name (most significant), then first name, then ... : If you don't know the last name, you cannot profit from the sorting and have to scan the wole book. See the manual for full details on how mysql uses indexes: http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html HTH, Jörg -- Joerg Bruehe, MySQL Build Team, joerg.bru...@oracle.com ORACLE Deutschland B.V. Co. KG, Komturstrasse 18a, D-12099 Berlin Geschaeftsfuehrer: Juergen Kunz, Marcel v.d. Molen, Alexander v.d. Ven Amtsgericht Muenchen: HRA 95603 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Indexing question
Thanks for your reply. So should we create individual indexes on each field or a multiple column index ?? On 3 Oct 2010, at 16:44, Joerg Bruehe joerg.bru...@oracle.com wrote: Hi Neil, all! Tompkins Neil wrote: So if you have individual indexes for example field_1, field_2 and field_3 etc and then perform a search like WHERE field_1 = 10 AND field_3 = 'abc' This wouldn't improve the search ? You have to create a index for all possible combined field searches ? No - you didn't read Gavin's mail exact enough: On Fri, Oct 1, 2010 at 9:35 PM, Gavin Towey gto...@ffn.com wrote: [[...]] Additionally indexes are always read left to right. So an index on ('user_id', 'product_id') will help when doing WHERE user_id=N AND product_id IN (1,2,3), but wouldn't help for just the condtion on product_id. What Gavin calls left to right is what I call most significant first, the result is the same: In a multi-column index, the columns are listed in the order of their significance. Any DBMS (this is not limited to MySQL) can use such an index only if a condition for the first (= most significant) field (s) is (are) specified. Example: Assume the index is on fields A, B, and C in that order. A statement ... where A = x and B = y and C = z can use the index. A statement ... where A = x and B = y can use the index, limited to the first two fields. A statement ... where A = x can use the index. the first field only. A statement ... where A = x and C = z can also use the index for A, but will have to evaluate the condition on C by scanning all records matching A. A statement ... where B = y and C = z cannot use the index, because there is no condition on A. If there are many searches based on A and C only (not B), and there are many records matching A with different values of C, then an additional index on these two columns may be helpful. Compare the index with a phone book, which (typically) lists the entries sorted by last name (most significant), then first name, then ... : If you don't know the last name, you cannot profit from the sorting and have to scan the wole book. See the manual for full details on how mysql uses indexes: http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html HTH, Jörg -- Joerg Bruehe, MySQL Build Team, joerg.bru...@oracle.com ORACLE Deutschland B.V. Co. KG, Komturstrasse 18a, D-12099 Berlin Geschaeftsfuehrer: Juergen Kunz, Marcel v.d. Molen, Alexander v.d. Ven Amtsgericht Muenchen: HRA 95603 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Indexing question
Following on from my previous email I have columns containing numbers which are then used in SUM and MIN/ MAX functions should these be indexed too ? On 3 Oct 2010, at 16:44, Joerg Bruehe joerg.bru...@oracle.com wrote: Hi Neil, all! Tompkins Neil wrote: So if you have individual indexes for example field_1, field_2 and field_3 etc and then perform a search like WHERE field_1 = 10 AND field_3 = 'abc' This wouldn't improve the search ? You have to create a index for all possible combined field searches ? No - you didn't read Gavin's mail exact enough: On Fri, Oct 1, 2010 at 9:35 PM, Gavin Towey gto...@ffn.com wrote: [[...]] Additionally indexes are always read left to right. So an index on ('user_id', 'product_id') will help when doing WHERE user_id=N AND product_id IN (1,2,3), but wouldn't help for just the condtion on product_id. What Gavin calls left to right is what I call most significant first, the result is the same: In a multi-column index, the columns are listed in the order of their significance. Any DBMS (this is not limited to MySQL) can use such an index only if a condition for the first (= most significant) field (s) is (are) specified. Example: Assume the index is on fields A, B, and C in that order. A statement ... where A = x and B = y and C = z can use the index. A statement ... where A = x and B = y can use the index, limited to the first two fields. A statement ... where A = x can use the index. the first field only. A statement ... where A = x and C = z can also use the index for A, but will have to evaluate the condition on C by scanning all records matching A. A statement ... where B = y and C = z cannot use the index, because there is no condition on A. If there are many searches based on A and C only (not B), and there are many records matching A with different values of C, then an additional index on these two columns may be helpful. Compare the index with a phone book, which (typically) lists the entries sorted by last name (most significant), then first name, then ... : If you don't know the last name, you cannot profit from the sorting and have to scan the wole book. See the manual for full details on how mysql uses indexes: http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html HTH, Jörg -- Joerg Bruehe, MySQL Build Team, joerg.bru...@oracle.com ORACLE Deutschland B.V. Co. KG, Komturstrasse 18a, D-12099 Berlin Geschaeftsfuehrer: Juergen Kunz, Marcel v.d. Molen, Alexander v.d. Ven Amtsgericht Muenchen: HRA 95603 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Indexing question
Suppose I wanted to be able to perform queries against three columns of my table: 'user_id', 'product_id' and 'created'. Most of the time I'll just be range-selecting records from the table ordering by 'created'. But I may also want to select where 'user_id' = something and 'product_id' in (list, of, ids), ordered by 'created'. Do I need two separate indexes, one on 'created' and another on ('user_id', 'product_id', 'created'), or does having only the latter suffice the former case? -- Jonas, http://jonasgalvez.com.br
RE: Indexing question
You can't use an index to select records in a range, and order them. The order by will cause a filesort in that case. Additionally indexes are always read left to right. So an index on ('user_id', 'product_id') will help when doing WHERE user_id=N AND product_id IN (1,2,3), but wouldn't help for just the condtion on product_id. See the manual for full details on how mysql uses indexes: http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html -Original Message- From: Jonas Galvez [mailto:jonasgal...@gmail.com] Sent: Friday, October 01, 2010 11:48 AM To: mysql@lists.mysql.com Subject: Indexing question Suppose I wanted to be able to perform queries against three columns of my table: 'user_id', 'product_id' and 'created'. Most of the time I'll just be range-selecting records from the table ordering by 'created'. But I may also want to select where 'user_id' = something and 'product_id' in (list, of, ids), ordered by 'created'. Do I need two separate indexes, one on 'created' and another on ('user_id', 'product_id', 'created'), or does having only the latter suffice the former case? -- Jonas, http://jonasgalvez.com.br This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you are notified that reviewing, disseminating, disclosing, copying or distributing this e-mail is strictly prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any loss or damage caused by viruses or errors or omissions in the contents of this message, which arise as a result of e-mail transmission. [FriendFinder Networks, Inc., 220 Humbolt court, Sunnyvale, CA 94089, USA, FriendFinder.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Intro to indexing?
muhammad subair wrote: On Mon, Jul 27, 2009 at 9:25 PM, Ken D'Ambrosio k...@jots.org wrote: Hey, all. I'm trying to get indexing -- like, when do you specify an index name during index creation, is index use implicit or explicit, and, honestly, how exactly does it work, anyway? I've been RTFM'ing, but haven't found anything that really laid it out in black and white; usually, they'd give an example or two, but were awfully sparse on the whys and wherefores. So, if anyone has something they could point me to -- electronic or dead tree -- I'd be deeply appreciative. Thanks! Indexes can be on a single column or can span multiple columns (just like keys). An index will be used when running a query, if the search is being performed on the following: - A single column that has a single-column index for example, if we index departments on departmentID and perform a query like SELECT...WHERE departmentID=n. - A set of columns that forms a multicolumn index for example, if we have created an index on the employee.assignment table on (clientID, employeeID, workdate) and we perform a query like SELECT...WHERE clientID=x AND employeeID=y AND workdate=z. - A column or set of columns that forms a subset of a multicolumn index, as long as there is a leftmost prefix of the index columns for example, with the assignment table as before, with an index on (clientID, employeeID, workdate), indexes would be used for these types of queries: Source: MySQL Tutorial - SAMS Publishing When you set up the table, you must have a primary index. You make your best guess as to what would make it easier for the sql engine to find the data you want, but the sql engine, in its own wisdom will decide whether or not to use an index. You do not explicitly tell it to do a select using an index. Considerations: every time you do an insert, replace, or update of data included in an index, the index needs to be updated - which takes a small amount of time. indexes take space on disk - usually not a problem. if the engine can use an index, finding is much faster. if the database is small, who cares. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Intro to indexing?
Hey, all. I'm trying to get indexing -- like, when do you specify an index name during index creation, is index use implicit or explicit, and, honestly, how exactly does it work, anyway? I've been RTFM'ing, but haven't found anything that really laid it out in black and white; usually, they'd give an example or two, but were awfully sparse on the whys and wherefores. So, if anyone has something they could point me to -- electronic or dead tree -- I'd be deeply appreciative. Thanks! -Ken -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Intro to indexing?
On Mon, Jul 27, 2009 at 9:25 PM, Ken D'Ambrosio k...@jots.org wrote: Hey, all. I'm trying to get indexing -- like, when do you specify an index name during index creation, is index use implicit or explicit, and, honestly, how exactly does it work, anyway? I've been RTFM'ing, but haven't found anything that really laid it out in black and white; usually, they'd give an example or two, but were awfully sparse on the whys and wherefores. So, if anyone has something they could point me to -- electronic or dead tree -- I'd be deeply appreciative. Thanks! Indexes can be on a single column or can span multiple columns (just like keys). An index will be used when running a query, if the search is being performed on the following: - A single column that has a single-column index for example, if we index departments on departmentID and perform a query like SELECT...WHERE departmentID=n. - A set of columns that forms a multicolumn index for example, if we have created an index on the employee.assignment table on (clientID, employeeID, workdate) and we perform a query like SELECT...WHERE clientID=x AND employeeID=y AND workdate=z. - A column or set of columns that forms a subset of a multicolumn index, as long as there is a leftmost prefix of the index columns for example, with the assignment table as before, with an index on (clientID, employeeID, workdate), indexes would be used for these types of queries: Source: MySQL Tutorial - SAMS Publishing -- Muhammad Subair
When does indexing happen?
Using version 5.0.6x on RH. The question I have is about the updating of indexes. Say I have a table with a primary key and one or more indexes. I run an INSERT statement by way of a call to mysql_real_query() in the C api. If that function call returns zero, i.e., indicating success, does that mean that everything is guaranteed to be completed? In particular that all the table indexes have been updated, so that an immediately-subsequent SELECT statement with an appropriate WHERE clause seeking for the same row will find it using the index? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: When does indexing happen?
Using version 5.0.6x on RH. The question I have is about the updating of indexes. Say I have a table with a primary key and one or more indexes. I run an INSERT statement by way of a call to mysql_real_query() in the C api. If that function call returns zero, i.e., indicating success, does that mean that everything is guaranteed to be completed? In particular that all the table indexes have been updated, so that an immediately-subsequent SELECT statement with an appropriate WHERE clause seeking for the same row will find it using the index? Update with information I forgot to mention: using MyISAM tables for this. Also, the table is a MERGED table. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: When does indexing happen?
In the last episode (Jul 26), buf...@biffco.net said: Using version 5.0.6x on RH. The question I have is about the updating of indexes. Say I have a table with a primary key and one or more indexes. I run an INSERT statement by way of a call to mysql_real_query() in the C api. If that function call returns zero, i.e., indicating success, does that mean that everything is guaranteed to be completed? In particular that all the table indexes have been updated, so that an immediately-subsequent SELECT statement with an appropriate WHERE clause seeking for the same row will find it using the index? Yes. -- Dan Nelson dnel...@allantgroup.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Indexing dynamics in MySQL Community Edition 5.1.34
mo...@fastmail.fm (mos) writes: At 12:37 AM 6/25/2009, you wrote: ... my.cnf based on my-huge.cnf, expanding key_buffer to 8G, myisam_sort_buffer_size to 256M, and putting tmpdir on the fiber channel disk. You mean key_buffer_size don't you and not key_buffer? If you are using MySQL 5.0 then the upper limit is 4gb but 5.1 allows for more than 4gb. That's not entirely true. Later versions of 5.0 (above 5.0.56?) also allow key_buffer_size to be greater than 4GB and we are using that on several machines. Earlier versions of 5.0 did indeed have this problem. Simon -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Indexing dynamics in MySQL Community Edition 5.1.34
| | Ssl_ctx_verify_mode | 0 | | Ssl_default_timeout | 0 | | Ssl_finished_accepts | 0 | | Ssl_finished_connects | 0 | | Ssl_session_cache_hits| 0 | | Ssl_session_cache_misses | 0 | | Ssl_session_cache_mode| NONE | | Ssl_session_cache_overflows | 0 | | Ssl_session_cache_size| 0 | | Ssl_session_cache_timeouts| 0 | | Ssl_sessions_reused | 0 | | Ssl_used_session_cache_entries| 0 | | Ssl_verify_depth | 0 | | Ssl_verify_mode | 0 | | Ssl_version || | Table_locks_immediate | 27591 | | Table_locks_waited| 0 | | Tc_log_max_pages_used | 0 | | Tc_log_page_size | 0 | | Tc_log_page_waits | 0 | | Threads_cached| 0 | | Threads_connected | 4 | | Threads_created | 4 | | Threads_running | 2 | | Uptime| 202522 | | Uptime_since_flush_status | 202522 | +---++ Thanks, Mike Spreitzer mos mo...@fastmail.fm 06/25/09 01:05 PM To mysql@lists.mysql.com cc Subject Re: Indexing dynamics in MySQL Community Edition 5.1.34 Mike, I re-posted your Show Status to the group to see if anyone can offer a way to speed up the indexing for you. BTW, you are adding ALL of the indexes to the table using ONE sql statement right? And not a separate SQL statement to build each index? Mike At 02:01 AM 6/25/2009, you wrote: Like I said in the subject line, I am using 5.1.34. I started with my-huge.cnf, which says key_buffer rather than key_buffer_size; SHOW GLOBAL VARIABLES confirms, however, that my key_buffer_size is 8GB. That indexing operation finally finished after about 1.5 hours; that was about 0.5 hours ago. Now I am on to other things. Here is the status you suggested: [SNIP] Thanks, Mike Spreitzer mos mo...@fastmail.fm 06/25/09 02:32 AM To mysql@lists.mysql.com cc Subject Re: Indexing dynamics in MySQL Community Edition 5.1.34 At 12:37 AM 6/25/2009, you wrote: Actually, my characterization of the current state is wrong. It appears that one core is completely busy, I suppose MySQL does this indexing work in a single thread. Is it reasonable for indexing to be CPU bound? my.cnf based on my-huge.cnf, expanding key_buffer to 8G, myisam_sort_buffer_size to 256M, and putting tmpdir on the fiber channel disk. Mike, You mean key_buffer_size don't you and not key_buffer? If you are using MySQL 5.0 then the upper limit is 4gb but 5.1 allows for more than 4gb. Also try increasing sort_buffer_size. Posting your Show Status will help people see where the bottle neck is. Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=mspre...@us.ibm.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=mspre...@us.ibm.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Indexing dynamics in MySQL Community Edition 5.1.34
Today's instance finished shortly after I sent the email below. BTW, here are some specifics on the table (which uses MyISAM). Thursday's instance has 11 GB of data and 0.78 GB of index. Today's instance has 26 GB of data and 1.8 GB of index. Thanks, Mike Spreitzer Mike Spreitzer/Watson/i...@ibmus 06/27/09 09:48 AM To mos mo...@fastmail.fm cc mysql@lists.mysql.com Subject Re: Indexing dynamics in MySQL Community Edition 5.1.34 Yes, all the indices are added in one ALTER TABLE statement. Thursday's incarnation took about 1.5 hours, on a table created from about 8 GB of CSV. Today's has already taken over 8 hours, on a table created from about 22 GB of data. The logarithm of 22 GB is about 24/23 of the logarithm of 8 GB. I seem to have fallen off an additional cliff. As a reminder, here is the situation. I load a table from CSV with zero indices defined. Then I add some indices, and that takes a long time. On Thursday it kept one core busy, but the disk was not very busy for much of that time (about 600 blocks out per sec). Now I am seeing about 2.5 blocks out per second --- still nowhere near capacity --- and one core busy. The MySQL server is running on a 64-bit RHEL 5 machine with 16 Intel cores at 2.4 GHz, and 64 GB RAM. The db storage is on fiber channel. I created my.cnf based on my-huge.cnf, expanding key_buffer[_size] to 8G, myisam_sort_buffer_size to 256M, and putting tmpdir on the fiber channel disk. Here is my current SHOW STATUS: +---++ | Variable_name | Value | +---++ | Aborted_clients | 8 | | Aborted_connects | 0 | | Binlog_cache_disk_use | 0 | | Binlog_cache_use | 0 | | Bytes_received| 95 | | Bytes_sent| 180| | Com_admin_commands| 0 | | Com_assign_to_keycache| 0 | | Com_alter_db | 0 | | Com_alter_db_upgrade | 0 | | Com_alter_event | 0 | | Com_alter_function| 0 | | Com_alter_procedure | 0 | | Com_alter_server | 0 | | Com_alter_table | 0 | | Com_alter_tablespace | 0 | | Com_analyze | 0 | | Com_backup_table | 0 | | Com_begin | 0 | | Com_binlog| 0 | | Com_call_procedure| 0 | | Com_change_db | 0 | | Com_change_master | 0 | | Com_check | 0 | | Com_checksum | 0 | | Com_commit| 0 | | Com_create_db | 0 | | Com_create_event | 0 | | Com_create_function | 0 | | Com_create_index | 0 | | Com_create_procedure | 0 | | Com_create_server | 0 | | Com_create_table | 0 | | Com_create_trigger| 0 | | Com_create_udf| 0 | | Com_create_user | 0 | | Com_create_view | 0 | | Com_dealloc_sql | 0 | | Com_delete| 0 | | Com_delete_multi | 0 | | Com_do| 0 | | Com_drop_db | 0 | | Com_drop_event| 0 | | Com_drop_function | 0 | | Com_drop_index| 0 | | Com_drop_procedure| 0 | | Com_drop_server | 0 | | Com_drop_table| 0 | | Com_drop_trigger | 0 | | Com_drop_user | 0 | | Com_drop_view | 0 | | Com_empty_query | 0 | | Com_execute_sql | 0 | | Com_flush | 0 | | Com_grant | 0 | | Com_ha_close | 0 | | Com_ha_open | 0 | | Com_ha_read | 0 | | Com_help | 0 | | Com_insert| 0 | | Com_insert_select | 0 | | Com_install_plugin| 0 | | Com_kill | 0 | | Com_load | 0
Re: Indexing dynamics in MySQL Community Edition 5.1.34
On Sat, Jun 27, 2009 at 7:03 AM, Mike Spreitzermspre...@us.ibm.com wrote: Today's instance finished shortly after I sent the email below. BTW, here are some specifics on the table (which uses MyISAM). Thursday's instance has 11 GB of data and 0.78 GB of index. Today's instance has 26 GB of data and 1.8 GB of index. If you have the ability to test, I'd compare that to importing the csv into the table with the indexes already defined. The way you did it should be faster, but since you see that it's only using one core, I'd try splitting the data up into 16 separate files and importing them all at once. In theory (SWAG actually), multiple imports would each use their own core to whatever thread count you have innodb defined to use ... Oh, just saw that you were using myisam. Never mind. At any rate, I'd be very surprised if importing into a table with indexes already defined was the same speed or faster, but doing so could give you some useful information, such as at what point the import (and concurrent index creation) drops from expected level X to much reduced level Y. You could then (hopefully) find a correlation between some cache or buffer setting that will explain the sudden drop in speed. -- Regards... Todd -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Indexing dynamics in MySQL Community Edition 5.1.34
Who can please tell me what is mean of The db storage is on fiber channel.? On Fri, Jun 26, 2009 at 1:05 AM, mos mo...@fastmail.fm wrote: Mike, I re-posted your Show Status to the group to see if anyone can offer a way to speed up the indexing for you. BTW, you are adding ALL of the indexes to the table using ONE sql statement right? And not a separate SQL statement to build each index? Mike At 02:01 AM 6/25/2009, you wrote: Like I said in the subject line, I am using 5.1.34. I started with my-huge.cnf, which says key_buffer rather than key_buffer_size; SHOW GLOBAL VARIABLES confirms, however, that my key_buffer_size is 8GB. That indexing operation finally finished after about 1.5 hours; that was about 0.5 hours ago. Now I am on to other things. Here is the status you suggested: +---+---+ | Variable_name | Value | +---+---+ | Aborted_clients | 0 | | Aborted_connects | 0 | | Binlog_cache_disk_use | 0 | | Binlog_cache_use | 0 | | Bytes_received| 135 | | Bytes_sent| 1405 | | Com_admin_commands| 0 | | Com_assign_to_keycache| 0 | | Com_alter_db | 0 | | Com_alter_db_upgrade | 0 | | Com_alter_event | 0 | | Com_alter_function| 0 | | Com_alter_procedure | 0 | | Com_alter_server | 0 | | Com_alter_table | 0 | | Com_alter_tablespace | 0 | | Com_analyze | 0 | | Com_backup_table | 0 | | Com_begin | 0 | | Com_binlog| 0 | | Com_call_procedure| 0 | | Com_change_db | 0 | | Com_change_master | 0 | | Com_check | 0 | | Com_checksum | 0 | | Com_commit| 0 | | Com_create_db | 0 | | Com_create_event | 0 | | Com_create_function | 0 | | Com_create_index | 0 | | Com_create_procedure | 0 | | Com_create_server | 0 | | Com_create_table | 0 | | Com_create_trigger| 0 | | Com_create_udf| 0 | | Com_create_user | 0 | | Com_create_view | 0 | | Com_dealloc_sql | 0 | | Com_delete| 0 | | Com_delete_multi | 0 | | Com_do| 0 | | Com_drop_db | 0 | | Com_drop_event| 0 | | Com_drop_function | 0 | | Com_drop_index| 0 | | Com_drop_procedure| 0 | | Com_drop_server | 0 | | Com_drop_table| 0 | | Com_drop_trigger | 0 | | Com_drop_user | 0 | | Com_drop_view | 0 | | Com_empty_query | 0 | | Com_execute_sql | 0 | | Com_flush | 0 | | Com_grant | 0 | | Com_ha_close | 0 | | Com_ha_open | 0 | | Com_ha_read | 0 | | Com_help | 0 | | Com_insert| 0 | | Com_insert_select | 0 | | Com_install_plugin| 0 | | Com_kill | 0 | | Com_load | 0 | | Com_load_master_data | 0 | | Com_load_master_table | 0 | | Com_lock_tables | 0 | | Com_optimize | 0 | | Com_preload_keys | 0 | | Com_prepare_sql | 0 | | Com_purge | 0 | | Com_purge_before_date | 0 | | Com_release_savepoint | 0 | | Com_rename_table | 0 | | Com_rename_user | 0 | | Com_repair| 0 | | Com_replace | 0 | | Com_replace_select| 0 | | Com_reset | 0
RE: Indexing dynamics in MySQL Community Edition 5.1.34
-Original Message- From: Moon's Father [mailto:yueliangdao0...@gmail.com] Sent: Friday, June 26, 2009 4:12 AM To: mos Cc: mysql@lists.mysql.com Subject: Re: Indexing dynamics in MySQL Community Edition 5.1.34 Who can please tell me what is mean of The db storage is on fiber channel.? [JS] Fiber Channel (and that is the official way it is spelled, even in the USA) is an ultra-high-speed network for disk arrays, computers, and other devices. Although the physical medium is usual fiber optic, there is a lot more to it than that. It is more like an intranet in some ways, in that devices have addresses and there is intelligence in the network itself. On Fri, Jun 26, 2009 at 1:05 AM, mos mo...@fastmail.fm wrote: Mike, I re-posted your Show Status to the group to see if anyone can offer a way to speed up the indexing for you. BTW, you are adding ALL of the indexes to the table using ONE sql statement right? And not a separate SQL statement to build each index? Mike At 02:01 AM 6/25/2009, you wrote: Like I said in the subject line, I am using 5.1.34. I started with my-huge.cnf, which says key_buffer rather than key_buffer_size; SHOW GLOBAL VARIABLES confirms, however, that my key_buffer_size is 8GB. That indexing operation finally finished after about 1.5 hours; that was about 0.5 hours ago. Now I am on to other things. Here is the status you suggested: +---+---+ | Variable_name | Value | +---+---+ | Aborted_clients | 0 | | Aborted_connects | 0 | | Binlog_cache_disk_use | 0 | | Binlog_cache_use | 0 | | Bytes_received| 135 | | Bytes_sent| 1405 | | Com_admin_commands| 0 | | Com_assign_to_keycache| 0 | | Com_alter_db | 0 | | Com_alter_db_upgrade | 0 | | Com_alter_event | 0 | | Com_alter_function| 0 | | Com_alter_procedure | 0 | | Com_alter_server | 0 | | Com_alter_table | 0 | | Com_alter_tablespace | 0 | | Com_analyze | 0 | | Com_backup_table | 0 | | Com_begin | 0 | | Com_binlog| 0 | | Com_call_procedure| 0 | | Com_change_db | 0 | | Com_change_master | 0 | | Com_check | 0 | | Com_checksum | 0 | | Com_commit| 0 | | Com_create_db | 0 | | Com_create_event | 0 | | Com_create_function | 0 | | Com_create_index | 0 | | Com_create_procedure | 0 | | Com_create_server | 0 | | Com_create_table | 0 | | Com_create_trigger| 0 | | Com_create_udf| 0 | | Com_create_user | 0 | | Com_create_view | 0 | | Com_dealloc_sql | 0 | | Com_delete| 0 | | Com_delete_multi | 0 | | Com_do| 0 | | Com_drop_db | 0 | | Com_drop_event| 0 | | Com_drop_function | 0 | | Com_drop_index| 0 | | Com_drop_procedure| 0 | | Com_drop_server | 0 | | Com_drop_table| 0 | | Com_drop_trigger | 0 | | Com_drop_user | 0 | | Com_drop_view | 0 | | Com_empty_query | 0 | | Com_execute_sql | 0 | | Com_flush | 0 | | Com_grant | 0 | | Com_ha_close | 0 | | Com_ha_open | 0 | | Com_ha_read | 0 | | Com_help | 0 | | Com_insert| 0 | | Com_insert_select | 0 | | Com_install_plugin| 0 | | Com_kill | 0 | | Com_load | 0 | | Com_load_master_data | 0 | | Com_load_master_table | 0 | | Com_lock_tables | 0 | | Com_optimize
RE: Indexing dynamics in MySQL Community Edition 5.1.34
Oops, my spell checker got the better of me: it should be Fibre Channel. Regards, Jerry Schwartz The Infoshop by Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 www.the-infoshop.com -Original Message- From: Jerry Schwartz [mailto:jschwa...@the-infoshop.com] Sent: Friday, June 26, 2009 10:22 AM To: 'Moon's Father'; 'mos' Cc: mysql@lists.mysql.com Subject: RE: Indexing dynamics in MySQL Community Edition 5.1.34 -Original Message- From: Moon's Father [mailto:yueliangdao0...@gmail.com] Sent: Friday, June 26, 2009 4:12 AM To: mos Cc: mysql@lists.mysql.com Subject: Re: Indexing dynamics in MySQL Community Edition 5.1.34 Who can please tell me what is mean of The db storage is on fiber channel.? [JS] Fiber Channel (and that is the official way it is spelled, even in the USA) is an ultra-high-speed network for disk arrays, computers, and other devices. Although the physical medium is usual fiber optic, there is a lot more to it than that. It is more like an intranet in some ways, in that devices have addresses and there is intelligence in the network itself. On Fri, Jun 26, 2009 at 1:05 AM, mos mo...@fastmail.fm wrote: Mike, I re-posted your Show Status to the group to see if anyone can offer a way to speed up the indexing for you. BTW, you are adding ALL of the indexes to the table using ONE sql statement right? And not a separate SQL statement to build each index? Mike At 02:01 AM 6/25/2009, you wrote: Like I said in the subject line, I am using 5.1.34. I started with my-huge.cnf, which says key_buffer rather than key_buffer_size; SHOW GLOBAL VARIABLES confirms, however, that my key_buffer_size is 8GB. That indexing operation finally finished after about 1.5 hours; that was about 0.5 hours ago. Now I am on to other things. Here is the status you suggested: +---+---+ | Variable_name | Value | +---+---+ | Aborted_clients | 0 | | Aborted_connects | 0 | | Binlog_cache_disk_use | 0 | | Binlog_cache_use | 0 | | Bytes_received| 135 | | Bytes_sent| 1405 | | Com_admin_commands| 0 | | Com_assign_to_keycache| 0 | | Com_alter_db | 0 | | Com_alter_db_upgrade | 0 | | Com_alter_event | 0 | | Com_alter_function| 0 | | Com_alter_procedure | 0 | | Com_alter_server | 0 | | Com_alter_table | 0 | | Com_alter_tablespace | 0 | | Com_analyze | 0 | | Com_backup_table | 0 | | Com_begin | 0 | | Com_binlog| 0 | | Com_call_procedure| 0 | | Com_change_db | 0 | | Com_change_master | 0 | | Com_check | 0 | | Com_checksum | 0 | | Com_commit| 0 | | Com_create_db | 0 | | Com_create_event | 0 | | Com_create_function | 0 | | Com_create_index | 0 | | Com_create_procedure | 0 | | Com_create_server | 0 | | Com_create_table | 0 | | Com_create_trigger| 0 | | Com_create_udf| 0 | | Com_create_user | 0 | | Com_create_view | 0 | | Com_dealloc_sql | 0 | | Com_delete| 0 | | Com_delete_multi | 0 | | Com_do| 0 | | Com_drop_db | 0 | | Com_drop_event| 0 | | Com_drop_function | 0 | | Com_drop_index| 0 | | Com_drop_procedure| 0 | | Com_drop_server | 0 | | Com_drop_table| 0 | | Com_drop_trigger | 0 | | Com_drop_user | 0 | | Com_drop_view | 0 | | Com_empty_query | 0 | | Com_execute_sql | 0 | | Com_flush | 0 | | Com_grant | 0 | | Com_ha_close | 0 | | Com_ha_open | 0 | | Com_ha_read
RE: Indexing? (Warning: relative newbie.)
After a few off-list e-mails with Tim, I issued ALTER TABLE dbmail_messageblks ADD INDEX ( blocksize AND physmessage_id ); which took almost 11 hours to index. Once done, however, my select statement went from a hair over 50 minutes to 15 seconds. (1.69 seconds after the index was cached.) Wow. Thanks for the help, all! -Ken On Wed, June 24, 2009 12:03 pm, Little, Timothy wrote: To answer your questions in no particular order, YES you can speed it up with indexing. You might want to first create an index on ( blocksize AND physmessage_id ). Why, you might ask, index on physmessage_id? Because then the db won't have to do a fetch on items from the table since it's in the INDEX itself, saving any unnecessary reads. Realistically, I can't see that taking more than a few seconds, at most, to execute. However, making the index might take a serious bit of time. Please let us all know how it does or does not work. Tim... -Original Message- From: Ken D'Ambrosio [mailto:k...@jots.org] Sent: Wednesday, June 24, 2009 11:07 AM To: mysql@lists.mysql.com Subject: Indexing? (Warning: relative newbie.) Hi, all. I'm a long-time MySQL user who's only recently had to start learning some administrative stuff, largely because I finally have a decently-sized database. My database is about 100 GB; I'm using it -- via dbmail (www.dbmail.org) -- as a mail server for my company. While dbmail is well-and-good with its IMAP front-end, I'm thinking of writing a Python front-end to do some queries directly against MySQL. But some of them take a l-o-n-g time. As an example, I've got a table with slightly over a million records; I'd like to be able to show (say) only IDs of messages under a half-MB. The query would look something like this: select physmessage_id,blocksize from dbmail_messageblks where blocksize 50; That query takes 50 minutes. A smidge long to wait. So I said, Huh. That's impressive. And I tried it without the physmessage_id: select blocksize from dbmail_messageblks where blocksize 50; That took 14 seconds. A bit more in my timeframe. Can I optimize this with indexing? Should I be using a different DB engine? Is there a site/book I should be learning DBA fundamentals from that might offer me direction for stuff like this? Sorry for all the newbie questions, but I haven't done serious database stuff since Foxbase/dBase III days. Things have changed a little since then. Thanks! -Ken -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=tlit...@tgrnet.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=...@jots.org -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Indexing? (Warning: relative newbie.)
Hi Walter, Walter Heck - OlinData.com wrote: Hey Tim, all On Wed, Jun 24, 2009 at 10:03 AM, Little, Timothytlit...@thomaspublishing.com wrote: Why, you might ask, index on physmessage_id? Because then the db won't have to do a fetch on items from the table since it's in the INDEX itself, saving any unnecessary reads. FYI: That only holds true for InnoDB, not for MyISAM. I think you have confused the InnoDB behavior of using the entire PRIMARY KEY as the unique row identifier for each entry in a secondary key with the practice of defining a covering index. Tim was correct: an index on (blocksize,physmessage_id) would allow that query to avoid any direct reads of the data table as all of the information for the query would have come from the index itself (regardless of database engine). -- Shawn Green, MySQL Senior Support Engineer Sun Microsystems, Inc. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Indexing dynamics in MySQL Community Edition 5.1.34
At 12:37 AM 6/25/2009, you wrote: Actually, my characterization of the current state is wrong. It appears that one core is completely busy, I suppose MySQL does this indexing work in a single thread. Is it reasonable for indexing to be CPU bound? my.cnf based on my-huge.cnf, expanding key_buffer to 8G, myisam_sort_buffer_size to 256M, and putting tmpdir on the fiber channel disk. Mike, You mean key_buffer_size don't you and not key_buffer? If you are using MySQL 5.0 then the upper limit is 4gb but 5.1 allows for more than 4gb. Also try increasing sort_buffer_size. Posting your Show Status will help people see where the bottle neck is. Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: Indexing dynamics in MySQL Community Edition 5.1.34
-Original Message- From: Mike Spreitzer [mailto:mspre...@us.ibm.com] Sent: Thursday, June 25, 2009 1:38 AM To: mysql@lists.mysql.com Subject: Re: Indexing dynamics in MySQL Community Edition 5.1.34 Actually, my characterization of the current state is wrong. It appears that one core is completely busy, I suppose MySQL does this indexing work in a single thread. Is it reasonable for indexing to be CPU bound? [JS] Yes, it is very reasonable if you have enough data cached. One of the things it has to do is sort, and because memory access is much faster than disk access you want to avoid the latter if at all possible. If you'd ever seen a sort using magnetic tape (pre-disk), you'd have a vivid mental image of this. As for using a single thread, that is not compulsory. You could partition the data and do a multi-threaded sort-merge operation. I no longer remember the results of the sort wars of the early days, and I have no idea what MySQL does. Thanks, Mike Spreitzer Mike Spreitzer/Watson/i...@ibmus 06/25/09 01:30 AM To mysql@lists.mysql.com cc Subject Indexing dynamics in MySQL Community Edition 5.1.34 Using MyISAM on a table loaded from 8GB of CSV, I am now adding some indices. In a separate shell I monitor the progress, alternately with `vmstat` and show full processlist. At first vmstat shows rapid progress; an example is # vmstat 5 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpdfree buffcache si sobi bo in cs us sy id wa st 1 6 0 8542108 256860 5394040000 0 152783 1314 298 3 4 60 32 0 1 6 0 8541000 256868 5394039200 0 147868 1301 287 3 4 68 25 0 1 5 0 8541380 256876 5394040000 0 150633 1310 277 3 4 72 21 0 1 6 0 8541108 256884 5394039200 0 152066 1307 271 3 4 71 21 0 1 7 0 8541116 256892 5394040000 0 151452 1312 311 3 4 64 29 0 1 6 0 8541992 256900 5394039200 0 192175 1402 295 3 4 66 26 0 1 6 0 8535684 256908 5394040000 0 108783 1227 276 3 4 69 24 0 1 8 0 8539116 256916 5394039200 0 155958 1318 262 3 4 82 11 0 1 6 0 8540860 256924 5394039200 0 166599 1340 328 3 4 66 27 0 1 9 0 8538512 256932 5394039200 0 165386 1336 319 3 4 62 31 0 1 6 0 8536776 256940 5394039200 0 175106 1358 303 3 5 66 27 0 2 0 0 8538884 256944 5394039600 0 187839 1402 305 3 5 70 22 0 1 1 0 8517060 256952 5394040000 0 188694 1379 307 3 4 66 27 0 1 10 0 8511604 256960 5394040000 0 175821 1335 294 2 5 69 24 0 1 10 0 8513340 256968 5394040000 0 164252 1335 300 3 4 65 28 0 2 0 0 8523012 256976 5394039200 0 151527 1318 305 3 5 60 33 0 1 10 0 8490152 256976 5394040000 0 178613 1352 301 2 5 67 26 0 2 0 0 8499576 256976 5394040000 0 142186 1319 302 2 5 69 23 0 1 10 0 8474280 256984 5394040000 0 185598 1348 301 2 5 57 36 0 2 9 0 8440676 256984 5394040000 0 166807 1334 306 2 5 53 39 0 1 9 0 8465228 256988 5394039600 0 114594 1268 306 1 6 56 36 0 1 9 0 16819736 256992 4554294400 0 185034 1342 301 1 6 56 36 0 1 9 0 20314428 257028 4213472400 0 186163 1371 282 1 6 56 37 0 1 9 0 20276856 257068 4217125200 0 166406 1342 281 2 5 59 34 0 1 9 0 20237672 257108 4220934400 0 166810 1333 252 2 5 56 37 0 At this point, and not for the first time, I stop vmstat and show full processlist. It says ++--+---+--+-+--+--- +--- --+ | Id | User | Host | db | Command | Time | State | Info | ++--+---+--+-+--+--- +--- --+ | 1 | root | localhost | cel_4x52 | Query | 542 | copy to tmp table | ALTER TABLE ntfelt ADD PRIMARY KEY (p, epoch, ssi, q, kind, ev, c), ADD UNIQUE INDEX pesqekvc(p, epoch, ssi, q, eqoch, kind, version, c), ADD INDEX tc(t, c), ORDER BY p, epoch, ssi, q, kind, ev, c | | 3 | root | localhost | NULL | Query |0 | NULL | show full processlist
Re: Indexing dynamics in MySQL Community Edition 5.1.34
Mike, I re-posted your Show Status to the group to see if anyone can offer a way to speed up the indexing for you. BTW, you are adding ALL of the indexes to the table using ONE sql statement right? And not a separate SQL statement to build each index? Mike At 02:01 AM 6/25/2009, you wrote: Like I said in the subject line, I am using 5.1.34. I started with my-huge.cnf, which says key_buffer rather than key_buffer_size; SHOW GLOBAL VARIABLES confirms, however, that my key_buffer_size is 8GB. That indexing operation finally finished after about 1.5 hours; that was about 0.5 hours ago. Now I am on to other things. Here is the status you suggested: +---+---+ | Variable_name | Value | +---+---+ | Aborted_clients | 0 | | Aborted_connects | 0 | | Binlog_cache_disk_use | 0 | | Binlog_cache_use | 0 | | Bytes_received| 135 | | Bytes_sent| 1405 | | Com_admin_commands| 0 | | Com_assign_to_keycache| 0 | | Com_alter_db | 0 | | Com_alter_db_upgrade | 0 | | Com_alter_event | 0 | | Com_alter_function| 0 | | Com_alter_procedure | 0 | | Com_alter_server | 0 | | Com_alter_table | 0 | | Com_alter_tablespace | 0 | | Com_analyze | 0 | | Com_backup_table | 0 | | Com_begin | 0 | | Com_binlog| 0 | | Com_call_procedure| 0 | | Com_change_db | 0 | | Com_change_master | 0 | | Com_check | 0 | | Com_checksum | 0 | | Com_commit| 0 | | Com_create_db | 0 | | Com_create_event | 0 | | Com_create_function | 0 | | Com_create_index | 0 | | Com_create_procedure | 0 | | Com_create_server | 0 | | Com_create_table | 0 | | Com_create_trigger| 0 | | Com_create_udf| 0 | | Com_create_user | 0 | | Com_create_view | 0 | | Com_dealloc_sql | 0 | | Com_delete| 0 | | Com_delete_multi | 0 | | Com_do| 0 | | Com_drop_db | 0 | | Com_drop_event| 0 | | Com_drop_function | 0 | | Com_drop_index| 0 | | Com_drop_procedure| 0 | | Com_drop_server | 0 | | Com_drop_table| 0 | | Com_drop_trigger | 0 | | Com_drop_user | 0 | | Com_drop_view | 0 | | Com_empty_query | 0 | | Com_execute_sql | 0 | | Com_flush | 0 | | Com_grant | 0 | | Com_ha_close | 0 | | Com_ha_open | 0 | | Com_ha_read | 0 | | Com_help | 0 | | Com_insert| 0 | | Com_insert_select | 0 | | Com_install_plugin| 0 | | Com_kill | 0 | | Com_load | 0 | | Com_load_master_data | 0 | | Com_load_master_table | 0 | | Com_lock_tables | 0 | | Com_optimize | 0 | | Com_preload_keys | 0 | | Com_prepare_sql | 0 | | Com_purge | 0 | | Com_purge_before_date | 0 | | Com_release_savepoint | 0 | | Com_rename_table | 0 | | Com_rename_user | 0 | | Com_repair| 0 | | Com_replace | 0 | | Com_replace_select| 0 | | Com_reset | 0 | | Com_restore_table | 0 | | Com_revoke| 0 | | Com_revoke_all| 0 | | Com_rollback | 0
Indexing? (Warning: relative newbie.)
Hi, all. I'm a long-time MySQL user who's only recently had to start learning some administrative stuff, largely because I finally have a decently-sized database. My database is about 100 GB; I'm using it -- via dbmail (www.dbmail.org) -- as a mail server for my company. While dbmail is well-and-good with its IMAP front-end, I'm thinking of writing a Python front-end to do some queries directly against MySQL. But some of them take a l-o-n-g time. As an example, I've got a table with slightly over a million records; I'd like to be able to show (say) only IDs of messages under a half-MB. The query would look something like this: select physmessage_id,blocksize from dbmail_messageblks where blocksize 50; That query takes 50 minutes. A smidge long to wait. So I said, Huh. That's impressive. And I tried it without the physmessage_id: select blocksize from dbmail_messageblks where blocksize 50; That took 14 seconds. A bit more in my timeframe. Can I optimize this with indexing? Should I be using a different DB engine? Is there a site/book I should be learning DBA fundamentals from that might offer me direction for stuff like this? Sorry for all the newbie questions, but I haven't done serious database stuff since Foxbase/dBase III days. Things have changed a little since then. Thanks! -Ken -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
RE: Indexing? (Warning: relative newbie.)
To answer your questions in no particular order, YES you can speed it up with indexing. You might want to first create an index on ( blocksize AND physmessage_id ). Why, you might ask, index on physmessage_id? Because then the db won't have to do a fetch on items from the table since it's in the INDEX itself, saving any unnecessary reads. Realistically, I can't see that taking more than a few seconds, at most, to execute. However, making the index might take a serious bit of time. Please let us all know how it does or does not work. Tim... -Original Message- From: Ken D'Ambrosio [mailto:k...@jots.org] Sent: Wednesday, June 24, 2009 11:07 AM To: mysql@lists.mysql.com Subject: Indexing? (Warning: relative newbie.) Hi, all. I'm a long-time MySQL user who's only recently had to start learning some administrative stuff, largely because I finally have a decently-sized database. My database is about 100 GB; I'm using it -- via dbmail (www.dbmail.org) -- as a mail server for my company. While dbmail is well-and-good with its IMAP front-end, I'm thinking of writing a Python front-end to do some queries directly against MySQL. But some of them take a l-o-n-g time. As an example, I've got a table with slightly over a million records; I'd like to be able to show (say) only IDs of messages under a half-MB. The query would look something like this: select physmessage_id,blocksize from dbmail_messageblks where blocksize 50; That query takes 50 minutes. A smidge long to wait. So I said, Huh. That's impressive. And I tried it without the physmessage_id: select blocksize from dbmail_messageblks where blocksize 50; That took 14 seconds. A bit more in my timeframe. Can I optimize this with indexing? Should I be using a different DB engine? Is there a site/book I should be learning DBA fundamentals from that might offer me direction for stuff like this? Sorry for all the newbie questions, but I haven't done serious database stuff since Foxbase/dBase III days. Things have changed a little since then. Thanks! -Ken -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=tlit...@tgrnet.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Re: Indexing? (Warning: relative newbie.)
Hey Tim, all On Wed, Jun 24, 2009 at 10:03 AM, Little, Timothytlit...@thomaspublishing.com wrote: Why, you might ask, index on physmessage_id? Because then the db won't have to do a fetch on items from the table since it's in the INDEX itself, saving any unnecessary reads. FYI: That only holds true for InnoDB, not for MyISAM. cheers, -- Walter Heck, Engineer @ Open Query (http://openquery.com) Affordable Training and ProActive Support for MySQL related technologies Follow our blog at http://openquery.com/blog/ OurDelta: free enhanced builds for MySQL @ http://ourdelta.org -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql?unsub=arch...@jab.org
Indexing dynamics in MySQL Community Edition 5.1.34
Using MyISAM on a table loaded from 8GB of CSV, I am now adding some indices. In a separate shell I monitor the progress, alternately with `vmstat` and show full processlist. At first vmstat shows rapid progress; an example is # vmstat 5 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpdfree buffcache si sobi bo in cs us sy id wa st 1 6 0 8542108 256860 5394040000 0 152783 1314 298 3 4 60 32 0 1 6 0 8541000 256868 5394039200 0 147868 1301 287 3 4 68 25 0 1 5 0 8541380 256876 5394040000 0 150633 1310 277 3 4 72 21 0 1 6 0 8541108 256884 5394039200 0 152066 1307 271 3 4 71 21 0 1 7 0 8541116 256892 5394040000 0 151452 1312 311 3 4 64 29 0 1 6 0 8541992 256900 5394039200 0 192175 1402 295 3 4 66 26 0 1 6 0 8535684 256908 5394040000 0 108783 1227 276 3 4 69 24 0 1 8 0 8539116 256916 5394039200 0 155958 1318 262 3 4 82 11 0 1 6 0 8540860 256924 5394039200 0 166599 1340 328 3 4 66 27 0 1 9 0 8538512 256932 5394039200 0 165386 1336 319 3 4 62 31 0 1 6 0 8536776 256940 5394039200 0 175106 1358 303 3 5 66 27 0 2 0 0 8538884 256944 5394039600 0 187839 1402 305 3 5 70 22 0 1 1 0 8517060 256952 5394040000 0 188694 1379 307 3 4 66 27 0 1 10 0 8511604 256960 5394040000 0 175821 1335 294 2 5 69 24 0 1 10 0 8513340 256968 5394040000 0 164252 1335 300 3 4 65 28 0 2 0 0 8523012 256976 5394039200 0 151527 1318 305 3 5 60 33 0 1 10 0 8490152 256976 5394040000 0 178613 1352 301 2 5 67 26 0 2 0 0 8499576 256976 5394040000 0 142186 1319 302 2 5 69 23 0 1 10 0 8474280 256984 5394040000 0 185598 1348 301 2 5 57 36 0 2 9 0 8440676 256984 5394040000 0 166807 1334 306 2 5 53 39 0 1 9 0 8465228 256988 5394039600 0 114594 1268 306 1 6 56 36 0 1 9 0 16819736 256992 4554294400 0 185034 1342 301 1 6 56 36 0 1 9 0 20314428 257028 4213472400 0 186163 1371 282 1 6 56 37 0 1 9 0 20276856 257068 4217125200 0 166406 1342 281 2 5 59 34 0 1 9 0 20237672 257108 4220934400 0 166810 1333 252 2 5 56 37 0 At this point, and not for the first time, I stop vmstat and show full processlist. It says ++--+---+--+-+--+---+-+ | Id | User | Host | db | Command | Time | State | Info | ++--+---+--+-+--+---+-+ | 1 | root | localhost | cel_4x52 | Query | 542 | copy to tmp table | ALTER TABLE ntfelt ADD PRIMARY KEY (p, epoch, ssi, q, kind, ev, c), ADD UNIQUE INDEX pesqekvc(p, epoch, ssi, q, eqoch, kind, version, c), ADD INDEX tc(t, c), ORDER BY p, epoch, ssi, q, kind, ev, c | | 3 | root | localhost | NULL | Query |0 | NULL | show full processlist | ++--+---+--+-+--+---+-+ OK, so it is still indexing. Then I start up `vmstat` again, and it shows very different dynamics: # vmstat 5 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buffcache si sobibo in cs us sy id wa st 1 0 0 32429508 257248 3043925600 03076 0 0 100 0 0 1 1 0 32416124 257272 3045166800 0 2471 1020 111 6 0 94 0 0 1 0 0 32405096 257292 3046178000 0 2467 1017 109 6 0 94 0 0 1 0 0 32391828 257312 3047443600 0 2056 1019 107 6 0 94 0 0 1 0 0 32378684 257332 3048635600 0 2563 1040 109 6 0 94 0 0 1 0 0 32358224 257352 3050082400 0 3756 1038 109 6 0 93 0 0 1 0 0 32342600 257380 3051949200 0 3356 1035 112 6 0 93 0 0 1 0 0 32322140 257404 3053768800 0 3696 1023 108 6 0 94 0 0 I check
Re: Indexing dynamics in MySQL Community Edition 5.1.34
Actually, my characterization of the current state is wrong. It appears that one core is completely busy, I suppose MySQL does this indexing work in a single thread. Is it reasonable for indexing to be CPU bound? Thanks, Mike Spreitzer Mike Spreitzer/Watson/i...@ibmus 06/25/09 01:30 AM To mysql@lists.mysql.com cc Subject Indexing dynamics in MySQL Community Edition 5.1.34 Using MyISAM on a table loaded from 8GB of CSV, I am now adding some indices. In a separate shell I monitor the progress, alternately with `vmstat` and show full processlist. At first vmstat shows rapid progress; an example is # vmstat 5 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpdfree buffcache si sobi bo in cs us sy id wa st 1 6 0 8542108 256860 5394040000 0 152783 1314 298 3 4 60 32 0 1 6 0 8541000 256868 5394039200 0 147868 1301 287 3 4 68 25 0 1 5 0 8541380 256876 5394040000 0 150633 1310 277 3 4 72 21 0 1 6 0 8541108 256884 5394039200 0 152066 1307 271 3 4 71 21 0 1 7 0 8541116 256892 5394040000 0 151452 1312 311 3 4 64 29 0 1 6 0 8541992 256900 5394039200 0 192175 1402 295 3 4 66 26 0 1 6 0 8535684 256908 5394040000 0 108783 1227 276 3 4 69 24 0 1 8 0 8539116 256916 5394039200 0 155958 1318 262 3 4 82 11 0 1 6 0 8540860 256924 5394039200 0 166599 1340 328 3 4 66 27 0 1 9 0 8538512 256932 5394039200 0 165386 1336 319 3 4 62 31 0 1 6 0 8536776 256940 5394039200 0 175106 1358 303 3 5 66 27 0 2 0 0 8538884 256944 5394039600 0 187839 1402 305 3 5 70 22 0 1 1 0 8517060 256952 5394040000 0 188694 1379 307 3 4 66 27 0 1 10 0 8511604 256960 5394040000 0 175821 1335 294 2 5 69 24 0 1 10 0 8513340 256968 5394040000 0 164252 1335 300 3 4 65 28 0 2 0 0 8523012 256976 5394039200 0 151527 1318 305 3 5 60 33 0 1 10 0 8490152 256976 5394040000 0 178613 1352 301 2 5 67 26 0 2 0 0 8499576 256976 5394040000 0 142186 1319 302 2 5 69 23 0 1 10 0 8474280 256984 5394040000 0 185598 1348 301 2 5 57 36 0 2 9 0 8440676 256984 5394040000 0 166807 1334 306 2 5 53 39 0 1 9 0 8465228 256988 5394039600 0 114594 1268 306 1 6 56 36 0 1 9 0 16819736 256992 4554294400 0 185034 1342 301 1 6 56 36 0 1 9 0 20314428 257028 4213472400 0 186163 1371 282 1 6 56 37 0 1 9 0 20276856 257068 4217125200 0 166406 1342 281 2 5 59 34 0 1 9 0 20237672 257108 4220934400 0 166810 1333 252 2 5 56 37 0 At this point, and not for the first time, I stop vmstat and show full processlist. It says ++--+---+--+-+--+---+-+ | Id | User | Host | db | Command | Time | State | Info | ++--+---+--+-+--+---+-+ | 1 | root | localhost | cel_4x52 | Query | 542 | copy to tmp table | ALTER TABLE ntfelt ADD PRIMARY KEY (p, epoch, ssi, q, kind, ev, c), ADD UNIQUE INDEX pesqekvc(p, epoch, ssi, q, eqoch, kind, version, c), ADD INDEX tc(t, c), ORDER BY p, epoch, ssi, q, kind, ev, c | | 3 | root | localhost | NULL | Query |0 | NULL | show full processlist | ++--+---+--+-+--+---+-+ OK, so it is still indexing. Then I start up `vmstat` again, and it shows very different dynamics: # vmstat 5 procs ---memory-- ---swap-- -io --system-- -cpu-- r b swpd free buffcache si sobibo in cs us sy id wa st 1 0 0 32429508 257248 3043925600 03076 0 0 100 0 0 1 1 0 32416124 257272 3045166800 0 2471 1020 111 6 0 94 0 0 1 0 0 32405096 257292 3046178000 0 2467 1017 109 6 0 94 0 0 1 0 0
INDEXING ALL COLUMNS
Hi all, I am looking for, is there any specific reason for not indexing all columns of a table. whats the impact on the performance. Although indexing is meant for getting great performance. So, why indexing all columns is not feasible. (Read in docs that all columns should not be indexed) -- Krishna Chandra Prajapati
Re: INDEXING ALL COLUMNS
As your table grows your inserts will start to get slower and slower. You run into the issue of locking a table due to re-creating the indexes. Also wasted space for indexes On 9/5/08, Krishna Chandra Prajapati [EMAIL PROTECTED] wrote: Hi all, I am looking for, is there any specific reason for not indexing all columns of a table. whats the impact on the performance. Although indexing is meant for getting great performance. So, why indexing all columns is not feasible. (Read in docs that all columns should not be indexed) -- Krishna Chandra Prajapati
Re: INDEXING ALL COLUMNS
Hi, Following on from what Mike mentioned, indexing all columns does not really help as MySQL will at most use one index for a query, so its important to pick your indexes carefully and consider constructing composite indexes. An index on a single column may not even be used due to poor cardinality. Ewen On Fri, Sep 5, 2008 at 9:27 PM, Krishna Chandra Prajapati [EMAIL PROTECTED] wrote: Hi all, I am looking for, is there any specific reason for not indexing all columns of a table. whats the impact on the performance. Although indexing is meant for getting great performance. So, why indexing all columns is not feasible. (Read in docs that all columns should not be indexed) -- Krishna Chandra Prajapati -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: INDEXING ALL COLUMNS
Hi, What would you say about the below table . What can i do to make it more efficient. CREATE TABLE mailer_student_status ( student_id decimal(22,0) NOT NULL default '0', param varchar(128) NOT NULL default '', value varchar(128) default NULL, PRIMARY KEY (student_id,param). KEY idx_value (value) ) On Sat, Sep 6, 2008 at 1:16 AM, ewen fortune [EMAIL PROTECTED] wrote: Hi, Following on from what Mike mentioned, indexing all columns does not really help as MySQL will at most use one index for a query, so its important to pick your indexes carefully and consider constructing composite indexes. An index on a single column may not even be used due to poor cardinality. Ewen On Fri, Sep 5, 2008 at 9:27 PM, Krishna Chandra Prajapati [EMAIL PROTECTED] wrote: Hi all, I am looking for, is there any specific reason for not indexing all columns of a table. whats the impact on the performance. Although indexing is meant for getting great performance. So, why indexing all columns is not feasible. (Read in docs that all columns should not be indexed) -- Krishna Chandra Prajapati -- Krishna Chandra Prajapati
Re: INDEXING ALL COLUMNS
We'd need more information on what the where clauses of the queries look like to assist with this. -Aaron On 9/5/08, Krishna Chandra Prajapati [EMAIL PROTECTED] wrote: Hi, What would you say about the below table . What can i do to make it more efficient. CREATE TABLE mailer_student_status ( student_id decimal(22,0) NOT NULL default '0', param varchar(128) NOT NULL default '', value varchar(128) default NULL, PRIMARY KEY (student_id,param). KEY idx_value (value) ) On Sat, Sep 6, 2008 at 1:16 AM, ewen fortune [EMAIL PROTECTED] wrote: Hi, Following on from what Mike mentioned, indexing all columns does not really help as MySQL will at most use one index for a query, so its important to pick your indexes carefully and consider constructing composite indexes. An index on a single column may not even be used due to poor cardinality. Ewen On Fri, Sep 5, 2008 at 9:27 PM, Krishna Chandra Prajapati [EMAIL PROTECTED] wrote: Hi all, I am looking for, is there any specific reason for not indexing all columns of a table. whats the impact on the performance. Although indexing is meant for getting great performance. So, why indexing all columns is not feasible. (Read in docs that all columns should not be indexed) -- Krishna Chandra Prajapati -- Krishna Chandra Prajapati -- Sent from my mobile device -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: INDEXING ALL COLUMNS
Hi, Well at first glance its hard to tell since param and value don't say a lot about the nature of the data. If this is innodb, you can have a PRIMARY KEY of student_id (assuming its unique) and a separate index on param, this is because of the way innodb is structure, the primary key is always implied in the makeup of any other index. You could perhaps consider how much of param and are interesting and create a composite index on them idx_param_value (param(10),value(10)) or something similar. Whats important for data types is what is going to be held, and whats important for indexes is how the data is going to be queried. Ewen On Fri, Sep 5, 2008 at 10:50 PM, Krishna Chandra Prajapati [EMAIL PROTECTED] wrote: Hi, What would you say about the below table . What can i do to make it more efficient. CREATE TABLE mailer_student_status ( student_id decimal(22,0) NOT NULL default '0', param varchar(128) NOT NULL default '', value varchar(128) default NULL, PRIMARY KEY (student_id,param). KEY idx_value (value) ) On Sat, Sep 6, 2008 at 1:16 AM, ewen fortune [EMAIL PROTECTED] wrote: Hi, Following on from what Mike mentioned, indexing all columns does not really help as MySQL will at most use one index for a query, so its important to pick your indexes carefully and consider constructing composite indexes. An index on a single column may not even be used due to poor cardinality. Ewen On Fri, Sep 5, 2008 at 9:27 PM, Krishna Chandra Prajapati [EMAIL PROTECTED] wrote: Hi all, I am looking for, is there any specific reason for not indexing all columns of a table. whats the impact on the performance. Although indexing is meant for getting great performance. So, why indexing all columns is not feasible. (Read in docs that all columns should not be indexed) -- Krishna Chandra Prajapati -- Krishna Chandra Prajapati -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: INDEXING ALL COLUMNS
More details. CREATE TABLE mailer_student_status ( student_id decimal(22,0) NOT NULL default '0', param varchar(128) NOT NULL default '', value varchar(128) default NULL, PRIMARY KEY (student_id,param). KEY idx_value (value) ) SELECT VALUE FROM mailer_student_status WHERE student_id=586925 and VALUE = 0 SELECT VALUE FROM mailer_student_status WHERE PARAM = 'FIRST_MAILER_COUPON_CODE' and VALUE = 0 On Sat, Sep 6, 2008 at 3:04 AM, Aaron Blew [EMAIL PROTECTED] wrote: We'd need more information on what the where clauses of the queries look like to assist with this. -Aaron On 9/5/08, Krishna Chandra Prajapati [EMAIL PROTECTED] wrote: Hi, What would you say about the below table . What can i do to make it more efficient. CREATE TABLE mailer_student_status ( student_id decimal(22,0) NOT NULL default '0', param varchar(128) NOT NULL default '', value varchar(128) default NULL, PRIMARY KEY (student_id,param). KEY idx_value (value) ) On Sat, Sep 6, 2008 at 1:16 AM, ewen fortune [EMAIL PROTECTED] wrote: Hi, Following on from what Mike mentioned, indexing all columns does not really help as MySQL will at most use one index for a query, so its important to pick your indexes carefully and consider constructing composite indexes. An index on a single column may not even be used due to poor cardinality. Ewen On Fri, Sep 5, 2008 at 9:27 PM, Krishna Chandra Prajapati [EMAIL PROTECTED] wrote: Hi all, I am looking for, is there any specific reason for not indexing all columns of a table. whats the impact on the performance. Although indexing is meant for getting great performance. So, why indexing all columns is not feasible. (Read in docs that all columns should not be indexed) -- Krishna Chandra Prajapati -- Krishna Chandra Prajapati -- Sent from my mobile device -- Krishna Chandra Prajapati
Show indexing status
Hi, I just finished restoring a 22gig SQL dump but the server is not performing anywhere near where it should be. I'm assuming this is because it's still rebuilding indexes on the imported tables. Is there any way to see the indexing status so I can gauge how far it's got? Thanks. -Stut -- http://stut.net/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Show indexing status
As far as I am aware, the index's are built on import. It may be that the key-buffer or innodb-buffer (depending on engine), and the query cache, are all cold. May take a day or so to build them up depending on size and load. Other than that there must be some external difference. Is it the same server? Same config file? Same default engine? Ben Stut wrote: Hi, I just finished restoring a 22gig SQL dump but the server is not performing anywhere near where it should be. I'm assuming this is because it's still rebuilding indexes on the imported tables. Is there any way to see the indexing status so I can gauge how far it's got? Thanks. -Stut -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Show indexing status
On 4 Jun 2008, at 15:53, Ben Clewett wrote: As far as I am aware, the index's are built on import. It may be that the key-buffer or innodb-buffer (depending on engine), and the query cache, are all cold. May take a day or so to build them up depending on size and load. Other than that there must be some external difference. Is it the same server? Same config file? Same default engine? All tables are InnoDB. Different server (temporarily on one of the Apache servers until replacement HDDs for the DB server arrive) but CPU is 50% idle with MySQL taking ~33% continuously. Load is hovering just over 2 (2 CPUs in there). Config file is probably a little different - we didn't have a copy of it before the DB HDD's went belly up. Stut wrote: Hi, I just finished restoring a 22gig SQL dump but the server is not performing anywhere near where it should be. I'm assuming this is because it's still rebuilding indexes on the imported tables. Is there any way to see the indexing status so I can gauge how far it's got? Thanks. -Stut Thanks. -Stut -- http://stut.net/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Indexing question
Hi, I have created a rather large table containing about 16M records. Most of the indexed fields are smallint, but there is one field that is a text field that I am using fulltext indexing on. The total size of the smallint indexes is only about 30 MB, but the fulltext index brings the total index size to about 2 GB. My question is: When mysql adds indexes to the keycache, does it add each index individually or does it dump the entire .MYI file to the keycache. If it is the latter, I should move the fulltext index to another table. Thanks in advance, Christian -- J. Christian Hesketh M.Sc. CEO - Ion Channel Media Group 2028 Harvard Avenue, Suite 103 Montreal, QC CANADA H4A 2V9 Tel: +1(514)245-8107 http://www.IonChannelMedia.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Indexing question
What is the size of the text field you're fulltext indexing? How often is that index used? You might be best off to create a table containing only that column and a PK that is equal to the PK in the original table. You might also keep a portion of the text field (say 50 characters) in the original table, for head-first indexing. Arthur On Tue, Mar 25, 2008 at 9:56 AM, J. Christian Hesketh [EMAIL PROTECTED] wrote: Hi, I have created a rather large table containing about 16M records. Most of the indexed fields are smallint, but there is one field that is a text field that I am using fulltext indexing on. The total size of the smallint indexes is only about 30 MB, but the fulltext index brings the total index size to about 2 GB. My question is: When mysql adds indexes to the keycache, does it add each index individually or does it dump the entire .MYI file to the keycache. If it is the latter, I should move the fulltext index to another table. Thanks in advance, Christian -- J. Christian Hesketh M.Sc. CEO - Ion Channel Media Group 2028 Harvard Avenue, Suite 103 Montreal, QC CANADA H4A 2V9 Tel: +1(514)245-8107 http://www.IonChannelMedia.com http://www.ionchannelmedia.com/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
Re: Indexing question
The field has up to 1000 characters, usually well over 255. It is searched constantly (2-3 times every second). The smallint values are scanned up to 40 times a second. So, are you saying that the entire MYI file is dumped into the keycache? If so, your suggestion to create a separate table with PK and text field seems to be the only sensible decision. Thanks in advance, Christian On Tue, Mar 25, 2008 at 10:11 AM, Arthur Fuller [EMAIL PROTECTED] wrote: What is the size of the text field you're fulltext indexing? How often is that index used? You might be best off to create a table containing only that column and a PK that is equal to the PK in the original table. You might also keep a portion of the text field (say 50 characters) in the original table, for head-first indexing. Arthur On Tue, Mar 25, 2008 at 9:56 AM, J. Christian Hesketh [EMAIL PROTECTED] wrote: Hi, I have created a rather large table containing about 16M records. Most of the indexed fields are smallint, but there is one field that is a text field that I am using fulltext indexing on. The total size of the smallint indexes is only about 30 MB, but the fulltext index brings the total index size to about 2 GB. My question is: When mysql adds indexes to the keycache, does it add each index individually or does it dump the entire .MYI file to the keycache. If it is the latter, I should move the fulltext index to another table. Thanks in advance, Christian -- J. Christian Hesketh M.Sc. CEO - Ion Channel Media Group 2028 Harvard Avenue, Suite 103 Montreal, QC CANADA H4A 2V9 Tel: +1(514)245-8107 http://www.IonChannelMedia.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- J. Christian Hesketh M.Sc. CEO - Ion Channel Media Group 2028 Harvard Avenue, Suite 103 Montreal, QC CANADA H4A 2V9 Tel: +1(514)245-8107 http://www.IonChannelMedia.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Indexing one byte flags - what implementattion is better
Garris, Nicole wrote: Is id a sequential number? And is it referenced by other tables? If so, and if over time new products become old products, then CASE 2 is more complex, because when moving a product (i.e., a row) from the new product table to the old product table, the value of id needs to stay the same. So for CASE 2 you'll need a third object to keep track of the highest value for id. I think you did not get it right. There is no MOVING of products. All product are stored in 'products' table only, and newproducts is just is kind of FLAG table. It contains ONLY ids of products considered new, nothing else. So, if product is new then its is of course in products table and in newproducts table and if it is not new then it is only in 'products' table. ID is a seqential number (but no auto_increment - i hate it). PS: top posting is really popular in mysql list! -- Artem -Original Message- From: Artem Kuchin [mailto:[EMAIL PROTECTED] Sent: Thursday, December 27, 2007 1:19 PM To: mysql@lists.mysql.com Subject: Indexing one byte flags - what implementattion is better Maybe someone could provide a good resonable input on this issue. Let's say i have a table products CASE 1: table: products id int unsigned not null, name char(128) not null, f_new tinyint not null id - is basically the id of a product name - is the name of a product f_new - is a one byte flag. If it is 1 the product is condireed new. In this case to select all new products including name i need to do: select id, name from products wher f_new=1 CASE 2: The above can be done another way - via two table, one products table and another one - listing all ids for new products create table products ( id int unsigned not null, name char(128) not null, primay key (id) ); create table newproducts ( product_id int unsigned not null, primay key (id) ); If product is is in newproducts table that it is a new product. To choose all new products including name i need to do: SELECT id,name FROM newproducts INNER JOIN products ON products.id=newproducts.product_id The questions are: 1) which way is FASTER? 2) which way eats less memory? 3) which way eats less cpu? 4) which way eats less hdd io? There are several cases for each question: 1) 1000 products - i think both methods are pretty much the same in this case because all of the data woul be cached in memory 2) 10 products, 3 new products - interesting to know which method is better here and how each of the method performs. 3) 10 products, 50 new products - interesting to know which method is better here and how each of the method performs. I will greately appriciate input on this issue. -- Artem -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Indexing one byte flags - what implementattion is better
Maybe someone could provide a good resonable input on this issue. Let's say i have a table products CASE 1: table: products id int unsigned not null, name char(128) not null, f_new tinyint not null id - is basically the id of a product name - is the name of a product f_new - is a one byte flag. If it is 1 the product is condireed new. In this case to select all new products including name i need to do: select id, name from products wher f_new=1 CASE 2: The above can be done another way - via two table, one products table and another one - listing all ids for new products create table products ( id int unsigned not null, name char(128) not null, primay key (id) ); create table newproducts ( product_id int unsigned not null, primay key (id) ); If product is is in newproducts table that it is a new product. To choose all new products including name i need to do: SELECT id,name FROM newproducts INNER JOIN products ON products.id=newproducts.product_id The questions are: 1) which way is FASTER? 2) which way eats less memory? 3) which way eats less cpu? 4) which way eats less hdd io? There are several cases for each question: 1) 1000 products - i think both methods are pretty much the same in this case because all of the data woul be cached in memory 2) 10 products, 3 new products - interesting to know which method is better here and how each of the method performs. 3) 10 products, 50 new products - interesting to know which method is better here and how each of the method performs. I will greately appriciate input on this issue. -- Artem -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Indexing one byte flags - what implementattion is better
Is id a sequential number? And is it referenced by other tables? If so, and if over time new products become old products, then CASE 2 is more complex, because when moving a product (i.e., a row) from the new product table to the old product table, the value of id needs to stay the same. So for CASE 2 you'll need a third object to keep track of the highest value for id. -Original Message- From: Artem Kuchin [mailto:[EMAIL PROTECTED] Sent: Thursday, December 27, 2007 1:19 PM To: mysql@lists.mysql.com Subject: Indexing one byte flags - what implementattion is better Maybe someone could provide a good resonable input on this issue. Let's say i have a table products CASE 1: table: products id int unsigned not null, name char(128) not null, f_new tinyint not null id - is basically the id of a product name - is the name of a product f_new - is a one byte flag. If it is 1 the product is condireed new. In this case to select all new products including name i need to do: select id, name from products wher f_new=1 CASE 2: The above can be done another way - via two table, one products table and another one - listing all ids for new products create table products ( id int unsigned not null, name char(128) not null, primay key (id) ); create table newproducts ( product_id int unsigned not null, primay key (id) ); If product is is in newproducts table that it is a new product. To choose all new products including name i need to do: SELECT id,name FROM newproducts INNER JOIN products ON products.id=newproducts.product_id The questions are: 1) which way is FASTER? 2) which way eats less memory? 3) which way eats less cpu? 4) which way eats less hdd io? There are several cases for each question: 1) 1000 products - i think both methods are pretty much the same in this case because all of the data woul be cached in memory 2) 10 products, 3 new products - interesting to know which method is better here and how each of the method performs. 3) 10 products, 50 new products - interesting to know which method is better here and how each of the method performs. I will greately appriciate input on this issue. -- Artem -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
indexing tables using my owns functions
Hi I've created a function that return a float value the code for it is : create function IDR(pin1 varchar(20),pin4 varchar(20),pin6 varchar(20),pin7 varchar(20),pin9 varchar(20),MOL varchar(20)) returns float DETERMINISTIC begin declare output float; declare P1 float; declare P4 float; declare P6 float; declare P7 float; declare P9 float; select VALUE into P1 from PSSMS where AA=pin1 and POS='1' and MOLEC=MOL; select VALUE into P4 from PSSMS where AA=pin4 and POS='4' and MOLEC=MOL; select VALUE into P6 from PSSMS where AA=pin6 and POS='6' and MOLEC=MOL; select VALUE into P7 from PSSMS where AA=pin7 and POS='7' and MOLEC=MOL; select VALUE into P9 from PSSMS where AA=pin9 and POS='9' and MOLEC=MOL; select P1+P4+P6+P7+P9 into output; return output; end // And it works, now, i would like index a table using this function. The table description is: mysql describe precalc; +---+-+--+-+-++ | Field | Type| Null | Key | Default | Extra | +---+-+--+-+-++ | id | int(6)| NO | PRI | NULL| auto_increment | | P1| char(1) | YES || NULL|| | P4| char(1) | YES || NULL|| | P6| char(1) | YES || NULL|| | P7| char(1) | YES || NULL|| | P9| char(1) | YES || NULL|| +---+-+--+-+-++ 6 rows in set (0.01 sec) and i try index by the following command: mysql create index AA on precalc (IDR(P1,P4,P6,P7,P9,'HLA-DRB13')); But i Get the following error: ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''P1','P4','P6','P7','P9','HLA-DRB13'))' at line 1 Some one knows where is the error? Thanks Pau -- Pau Marc Muñoz Torres Laboratori de Biologia Computacional Institut de Biotecnologia i Biomedicina Vicent Villar Universitat Autonoma de Barcelona E-08193 Bellaterra (Barcelona) telèfon: 93 5812807 Email : [EMAIL PROTECTED] -- Pau Marc Muñoz Torres Laboratori de Biologia Computacional Institut de Biotecnologia i Biomedicina Vicent Villar Universitat Autonoma de Barcelona E-08193 Bellaterra (Barcelona) telèfon: 93 5812807 Email : [EMAIL PROTECTED]
Re: indexing tables using my owns functions
mysql create index AA on precalc (IDR(P1,P4,P6,P7,P9,'HLA-DRB13')); But i Get the following error: ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''P1','P4','P6','P7','P9','HLA-DRB13'))' at line 1 Some one knows where is the error? As far as I can see ( http://dev.mysql.com/doc/refman/5.0/en/create-index.html ) you can only use columns, not a function. Martijn Tonies Database Workbench - tool for InterBase, Firebird, MySQL, NexusDB, Oracle MS SQL Server Upscene Productions http://www.upscene.com My thoughts: http://blog.upscene.com/martijn/ Database development questions? Check the forum! http://www.databasedevelopmentforum.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
indexing tables using my owns functions
Hi I've created a function that return a float value the code for it is : create function IDR(pin1 varchar(20),pin4 varchar(20),pin6 varchar(20),pin7 varchar(20),pin9 varchar(20),MOL varchar(20)) returns float DETERMINISTIC begin declare output float; declare P1 float; declare P4 float; declare P6 float; declare P7 float; declare P9 float; select VALUE into P1 from PSSMS where AA=pin1 and POS='1' and MOLEC=MOL; select VALUE into P4 from PSSMS where AA=pin4 and POS='4' and MOLEC=MOL; select VALUE into P6 from PSSMS where AA=pin6 and POS='6' and MOLEC=MOL; select VALUE into P7 from PSSMS where AA=pin7 and POS='7' and MOLEC=MOL; select VALUE into P9 from PSSMS where AA=pin9 and POS='9' and MOLEC=MOL; select P1+P4+P6+P7+P9 into output; return output; end // And it works, now, i would like index a table using this function. The table description is: mysql describe precalc; +---+-+--+-+-++ | Field | Type| Null | Key | Default | Extra | +---+-+--+-+-++ | id | int(6)| NO | PRI | NULL| auto_increment | | P1| char(1) | YES || NULL|| | P4| char(1) | YES || NULL|| | P6| char(1) | YES || NULL|| | P7| char(1) | YES || NULL|| | P9| char(1) | YES || NULL|| +---+-+--+-+-++ 6 rows in set (0.01 sec) and i try index by the following command: mysql create index AA on precalc (IDR(P1,P4,P6,P7,P9,'HLA-DRB13')); But i Get the following error: ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''P1','P4','P6','P7','P9','HLA-DRB13'))' at line 1 Some one knows where is the error? Thanks Pau -- Pau Marc Muñoz Torres Laboratori de Biologia Computacional Institut de Biotecnologia i Biomedicina Vicent Villar Universitat Autonoma de Barcelona E-08193 Bellaterra (Barcelona) telèfon: 93 5812807 Email : [EMAIL PROTECTED] -- Pau Marc Muñoz Torres Laboratori de Biologia Computacional Institut de Biotecnologia i Biomedicina Vicent Villar Universitat Autonoma de Barcelona E-08193 Bellaterra (Barcelona) telèfon: 93 5812807 Email : [EMAIL PROTECTED] -- Pau Marc Muñoz Torres Laboratori de Biologia Computacional Institut de Biotecnologia i Biomedicina Vicent Villar Universitat Autonoma de Barcelona E-08193 Bellaterra (Barcelona) telèfon: 93 5812807 Email : [EMAIL PROTECTED]
Re: indexing tables using my owns functions
, as far as i can see, from mysql 5.0 and upper it is possible create index using functions. http://www.faqs.org/docs/ppbook/r24254.htm But i keep having problems with the exemple from the link. Is there any bug in mysql 5.0.24a-log? 2007/11/13, Martijn Tonies [EMAIL PROTECTED]: mysql create index AA on precalc (IDR(P1,P4,P6,P7,P9,'HLA-DRB13')); But i Get the following error: ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''P1','P4','P6','P7','P9','HLA-DRB13'))' at line 1 Some one knows where is the error? As far as I can see ( http://dev.mysql.com/doc/refman/5.0/en/create-index.html ) you can only use columns, not a function. Martijn Tonies Database Workbench - tool for InterBase, Firebird, MySQL, NexusDB, Oracle MS SQL Server Upscene Productions http://www.upscene.com My thoughts: http://blog.upscene.com/martijn/ Database development questions? Check the forum! http://www.databasedevelopmentforum.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- Pau Marc Muñoz Torres Laboratori de Biologia Computacional Institut de Biotecnologia i Biomedicina Vicent Villar Universitat Autonoma de Barcelona E-08193 Bellaterra (Barcelona) telèfon: 93 5812807 Email : [EMAIL PROTECTED]
Re: indexing tables using my owns functions
, as far as i can see, from mysql 5.0 and upper it is possible create index using functions. http://www.faqs.org/docs/ppbook/r24254.htm But i keep having problems with the exemple from the link. Is there any bug in mysql 5.0.24a-log? The above website says: Practical PostgreSQL I cannot find MySQL anywhere on that page. Martijn Tonies Database Workbench - development tool for MySQL, and more! Upscene Productions http://www.upscene.com My thoughts: http://blog.upscene.com/martijn/ Database development questions? Check the forum! http://www.databasedevelopmentforum.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Advanced Indexing
Hi Cory, Cory Robin wrote: Is there a way to only include certain matching conditions in indexes? Example if I have a row I want to index that is mysql dates (2007-06-07) and I only want to include CURRENT and FUTURE dates in the index and ignore any past dates. Is that possible at all? The issue I have is that the ratio of queries on old vs. new data is like 1:10. And searches would be MUCH faster if I could force my queries that are looking at current or future data to use an index that ONLY had that information in them.. You can't do this. The strategies to do what you want will usually involve archiving off old data to other tables with fewer indexes, and creating a VIEW over the old and new tables, or moving all old data to a data warehouse, or something like that. Incidentally, this is exactly what I built MySQL Archiver to do (just released a few days ago): http://www.xaprb.com/blog/2007/06/06/mysql-archiver-091-released/ http://www.xaprb.com/blog/2006/05/02/how-to-write-efficient-archiving-and-purging-jobs-in-sql/ http://sourceforge.net/projects/mysqltoolkit Baron -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Advanced Indexing
On 6/7/07, Cory Robin [EMAIL PROTECTED] wrote: The issue I have is that the ratio of queries on old vs. new data is like 1:10. And searches would be MUCH faster if I could force my queries that are looking at current or future data to use an index that ONLY had that information in them.. You might want to look at partition pruning in MySQL 5.1. - Perrin -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Advanced Indexing
Is there a way to only include certain matching conditions in indexes? Example if I have a row I want to index that is mysql dates (2007-06-07) and I only want to include CURRENT and FUTURE dates in the index and ignore any past dates. Is that possible at all? The issue I have is that the ratio of queries on old vs. new data is like 1:10. And searches would be MUCH faster if I could force my queries that are looking at current or future data to use an index that ONLY had that information in them.. Thanks in advance for insight here. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
indexing order column
hi, if I have column order_id(int(4)) null do I have to index it too. I'm going to use it ONLY for sorting records. thanks. -afan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: indexing order column
Hi Afan, Afan Pasalic wrote: hi, if I have column order_id(int(4)) null do I have to index it too. I'm going to use it ONLY for sorting records. It depends a lot on how much data is in the table, etc etc. An index will make sorting more efficient in the general case when you have a decent amount of data. But it's hard to be specific with so little information. Baron -- Baron Schwartz http://www.xaprb.com/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: indexing order column
Baron Schwartz wrote: Hi Afan, Afan Pasalic wrote: hi, if I have column order_id(int(4)) null do I have to index it too. I'm going to use it ONLY for sorting records. It depends a lot on how much data is in the table, etc etc. An index will make sorting more efficient in the general case when you have a decent amount of data. But it's hard to be specific with so little information. Baron I have table products (product_id is PK). I have table categories (cat_id is PK). since the product can be in more than one category, I have prod_cat table: create prod_cat( cat_id int(8) unsigned not null, prod_id int(8) unsigned not null, order_id int(4) unsigned null, PRIMARY KEY (cat_id, prod_id) ) engine=Innodb; would it be enough info? thanks. -afan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: indexing order column
Hi, Afan Pasalic wrote: Baron Schwartz wrote: Hi Afan, Afan Pasalic wrote: hi, if I have column order_id(int(4)) null do I have to index it too. I'm going to use it ONLY for sorting records. It depends a lot on how much data is in the table, etc etc. An index will make sorting more efficient in the general case when you have a decent amount of data. But it's hard to be specific with so little information. Baron I have table products (product_id is PK). I have table categories (cat_id is PK). since the product can be in more than one category, I have prod_cat table: create prod_cat( cat_id int(8) unsigned not null, prod_id int(8) unsigned not null, order_id int(4) unsigned null, PRIMARY KEY (cat_id, prod_id) ) engine=Innodb; Okay, so your order_id is really sort order, not id of the customer's request to buy something. (As an aside, perhaps sort_order would confuse you less in the future when you don't remember the column's purpose anymore). It probably makes sense to index the column if you want to use it for sorting. You could also just order by the primary key. But I understand there are times when you want to have a different ordering. Baron -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: indexing order column
Baron Schwartz wrote: Hi, Afan Pasalic wrote: Baron Schwartz wrote: Hi Afan, Afan Pasalic wrote: hi, if I have column order_id(int(4)) null do I have to index it too. I'm going to use it ONLY for sorting records. It depends a lot on how much data is in the table, etc etc. An index will make sorting more efficient in the general case when you have a decent amount of data. But it's hard to be specific with so little information. Baron I have table products (product_id is PK). I have table categories (cat_id is PK). since the product can be in more than one category, I have prod_cat table: create prod_cat( cat_id int(8) unsigned not null, prod_id int(8) unsigned not null, order_id int(4) unsigned null, PRIMARY KEY (cat_id, prod_id) ) engine=Innodb; Okay, so your order_id is really sort order, not id of the customer's request to buy something. (As an aside, perhaps sort_order would confuse you less in the future when you don't remember the column's purpose anymore). It probably makes sense to index the column if you want to use it for sorting. You could also just order by the primary key. But I understand there are times when you want to have a different ordering. Baron yes, you're right. sort_order does make more sense :) order by PK, in my case is, let's say impossible because I'm sorting products in ONE category. e.g.: cat_id|prod_id|order_id 1 | 23 | 1 1 | 25 | 2 1 | 36 | 3 1 | 13 | 4 2 | 13 | 1 2 | 45 | 2 2 | 47 | 3 2 | 51 | 4 3 | 32 | 1 3 | 33 | 2 3 | 34 | 3 3 | 35 | 4 -afan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Joins versus Grouping/Indexing: Normalization Excessive?
Do you really mean 1:1 relationships? A song might have more than one artist, album, or genre. Could a song have no album at all, such as something that was released in MP3 format? Also, you might have more than one version of a song. You might need a separate table for recordings, with each song having one or more recordings. That would be a good place to keep the release information, rather than storing it with the song. The biggest problem would be to figure out how to index the songs, themselves. I don't have a good suggestion for that off the top of my head. That all being said, there's no reason I can think of not to normalize the data. Regards, Jerry Schwartz Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 -Original Message- From: Daniel Cousineau [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 03, 2007 12:15 AM To: mysql@lists.mysql.com Subject: Joins versus Grouping/Indexing: Normalization Excessive? So I'm currently designing a database for a web site and intra net for my campuses student radio. Since I'm not getting paid for this and I'm doing this in my free time I kinda want to take my time and have the system setup as perfectly as any one college student can. I'm currently debating on how I want to store all the songs in the system. Namely I'm not quite sure which way will eek the most performance out of MySQL. My plan so far is quite normalized, a songs table with 1:1 relationships with an Albums, Artists, and Genres table. The big benefits I see from this is when I'm building the intra net application I'll want to pull just all of the artists or all of the albums, etc. However I feel like I'm encountering issues with where to store the year field of an mp3 (do I want it on the album, song, or both) along with issues like printing everything out at once. The only other way I can think of thats relatively efficient is to have the singular songs table and have indexes on albums, artists, and genres. My question, more out of curiosity than necessity, is which of these would be more efficient (given that I'll be using the InnoDB storage engine)? Other relevant facts include it'll be using the latest, stable release of MySQL 5 and I'll be developing in PHP5 (through CakePHP's database abstraction layer). -- Thanks and Gig 'Em! Daniel Cousineau http://www.terminalfuture.com/ http://www.linkedin.com/in/dcousineau [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Joins versus Grouping/Indexing: Normalization Excessive?
So I'm currently designing a database for a web site and intra net for my campuses student radio. Since I'm not getting paid for this and I'm doing this in my free time I kinda want to take my time and have the system setup as perfectly as any one college student can. I'm currently debating on how I want to store all the songs in the system. Namely I'm not quite sure which way will eek the most performance out of MySQL. My plan so far is quite normalized, a songs table with 1:1 relationships with an Albums, Artists, and Genres table. The big benefits I see from this is when I'm building the intra net application I'll want to pull just all of the artists or all of the albums, etc. However I feel like I'm encountering issues with where to store the year field of an mp3 (do I want it on the album, song, or both) along with issues like printing everything out at once. The only other way I can think of thats relatively efficient is to have the singular songs table and have indexes on albums, artists, and genres. My question, more out of curiosity than necessity, is which of these would be more efficient (given that I'll be using the InnoDB storage engine)? Other relevant facts include it'll be using the latest, stable release of MySQL 5 and I'll be developing in PHP5 (through CakePHP's database abstraction layer). -- Thanks and Gig 'Em! Daniel Cousineau http://www.terminalfuture.com/ http://www.linkedin.com/in/dcousineau [EMAIL PROTECTED]
Re: Joins versus Grouping/Indexing: Normalization Excessive?
I think you're approaching this from the wrong angle. You'll want to put the data at the highest level at which it changes. i.e. If every song on an album is always the same year, put it at the album level, however, if it changes from song to song on a particular album, then you want it at the song level. Year wouldn't ever apply to artist I don't think, unless they're truly a one hit wonder. :) -Micah On 04/02/2007 09:14 PM, Daniel Cousineau wrote: So I'm currently designing a database for a web site and intra net for my campuses student radio. Since I'm not getting paid for this and I'm doing this in my free time I kinda want to take my time and have the system setup as perfectly as any one college student can. I'm currently debating on how I want to store all the songs in the system. Namely I'm not quite sure which way will eek the most performance out of MySQL. My plan so far is quite normalized, a songs table with 1:1 relationships with an Albums, Artists, and Genres table. The big benefits I see from this is when I'm building the intra net application I'll want to pull just all of the artists or all of the albums, etc. However I feel like I'm encountering issues with where to store the year field of an mp3 (do I want it on the album, song, or both) along with issues like printing everything out at once. The only other way I can think of thats relatively efficient is to have the singular songs table and have indexes on albums, artists, and genres. My question, more out of curiosity than necessity, is which of these would be more efficient (given that I'll be using the InnoDB storage engine)? Other relevant facts include it'll be using the latest, stable release of MySQL 5 and I'll be developing in PHP5 (through CakePHP's database abstraction layer). -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Help indexing this query.
--- ptsSignups -- id int(20) No mid int(20) No 0 ptsID int(20) No 0 pps double No 0 points double No 0 loginID varchar(255) No emailConfirm longtext No time timestamp Yes CURRENT_TIMESTAMP reasonForDeny longtext No status int(1) No 1 --- index (timeframe) --- timeframe (mid,status,time) --- query --- SELECT SUM(s.pps) as earned,m.id,m.username FROM ptsSignups s FORCE INDEX(timeframe) JOIN members m ON s.mid=m.id AND m.status='Member' LEFT JOIN ptsContestExclude e ON e.cid=1 AND e.mid=m.id WHERE s.status='2' AND s.time=2004-06-08 AND s.time2008-06-08+INTERVAL 1 DAY AND e.mid IS NULL GROUP BY s.mid HAVING earned0 ORDER BY earned DESC --- problem --- `ptsSignups` is a table listing everything my members have completed. Sometimes I like to run contests to see who has earned the most. `members` is a table that contains all my users. `ptsContestExclude` is a table of members of whom I would like to exclude from the contest. What I do first is group the table `ptsSignups` by member id, and calculate a sum of how much they earned. Then I reorder that sum in Descending order so the highest earned is on top. This `ptsSignups` table contains 82752 rows and is 75KB big. It runs extremely slow. I tried to create an index for it but it failed to increase performance. Any help is appreciated. -- View this message in context: http://www.nabble.com/Help-indexing-this-query.-tf3059045.html#a8505554 Sent from the MySQL - General mailing list archive at Nabble.com. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Help indexing this query.
Andrew, can you post the result of EXPLAIN query for your query? Minus the FORCE INDEX too. Also the structure of the other 2 tables would be helpful as well. Thanks, Dan On 1/22/07, altendew [EMAIL PROTECTED] wrote: --- ptsSignups -- id int(20) No mid int(20) No 0 ptsID int(20) No 0 pps double No 0 points double No 0 loginID varchar(255) No emailConfirm longtext No time timestamp Yes CURRENT_TIMESTAMP reasonForDeny longtext No status int(1) No 1 --- index (timeframe) --- timeframe (mid,status,time) --- query --- SELECT SUM(s.pps) as earned,m.id,m.username FROM ptsSignups s FORCE INDEX(timeframe) JOIN members m ON s.mid=m.id AND m.status='Member' LEFT JOIN ptsContestExclude e ON e.cid=1 AND e.mid=m.id WHERE s.status='2' AND s.time=2004-06-08 AND s.time2008-06-08+INTERVAL 1 DAY AND e.mid IS NULL GROUP BY s.mid HAVING earned0 ORDER BY earned DESC --- problem --- `ptsSignups` is a table listing everything my members have completed. Sometimes I like to run contests to see who has earned the most. `members` is a table that contains all my users. `ptsContestExclude` is a table of members of whom I would like to exclude from the contest. What I do first is group the table `ptsSignups` by member id, and calculate a sum of how much they earned. Then I reorder that sum in Descending order so the highest earned is on top. This `ptsSignups` table contains 82752 rows and is 75KB big. It runs extremely slow. I tried to create an index for it but it failed to increase performance. Any help is appreciated. -- View this message in context: http://www.nabble.com/Help-indexing-this-query.-tf3059045.html#a8505554 Sent from the MySQL - General mailing list archive at Nabble.com. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Help indexing this query.
--- EXPLAIN --- 1 SIMPLE e system cid NULL NULL NULL 0 const row not found 1 SIMPLE s ref sid,status,timeframe status 4 const 20438 Using where; Using temporary; Using filesort 1 SIMPLE m eq_ref PRIMARY,status PRIMARY 4 sc_72.s.mid 1 Using where --- members --- id int(20) No first varchar(255) No last varchar(255) No username varchar(25) No email varchar(255) No password varchar(25) No PRIMARY id --- ptsContestExclude --- cid int(20) No 0 mid int(20) No 0 UNIQUE (cid,mid) --- ptsSignups (all indexes) --- PRIMARY id INDEX (mid,ptsID) INDEX (status,ptsID) INDEX timeframe (mid, status, time) Dan Buettner-2 wrote: Andrew, can you post the result of EXPLAIN query for your query? Minus the FORCE INDEX too. Also the structure of the other 2 tables would be helpful as well. Thanks, Dan On 1/22/07, altendew [EMAIL PROTECTED] wrote: --- ptsSignups -- id int(20) No mid int(20) No 0 ptsID int(20) No 0 pps double No 0 points double No 0 loginID varchar(255) No emailConfirm longtext No time timestamp Yes CURRENT_TIMESTAMP reasonForDeny longtext No status int(1) No 1 --- index (timeframe) --- timeframe (mid,status,time) --- query --- SELECT SUM(s.pps) as earned,m.id,m.username FROM ptsSignups s FORCE INDEX(timeframe) JOIN members m ON s.mid=m.id AND m.status='Member' LEFT JOIN ptsContestExclude e ON e.cid=1 AND e.mid=m.id WHERE s.status='2' AND s.time=2004-06-08 AND s.time2008-06-08+INTERVAL 1 DAY AND e.mid IS NULL GROUP BY s.mid HAVING earned0 ORDER BY earned DESC --- problem --- `ptsSignups` is a table listing everything my members have completed. Sometimes I like to run contests to see who has earned the most. `members` is a table that contains all my users. `ptsContestExclude` is a table of members of whom I would like to exclude from the contest. What I do first is group the table `ptsSignups` by member id, and calculate a sum of how much they earned. Then I reorder that sum in Descending order so the highest earned is on top. This `ptsSignups` table contains 82752 rows and is 75KB big. It runs extremely slow. I tried to create an index for it but it failed to increase performance. Any help is appreciated. -- View this message in context: http://www.nabble.com/Help-indexing-this-query.-tf3059045.html#a8505554 Sent from the MySQL - General mailing list archive at Nabble.com. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- View this message in context: http://www.nabble.com/Help-indexing-this-query.-tf3059045.html#a8505966 Sent from the MySQL - General mailing list archive at Nabble.com. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Help indexing this query.
Andrew, couple of suggestions: 1 - where you use s.status='2' change it to s.status=2 otherwise MySQL is likely casting your data from int to string, which is slow and also precludes using an index. 2 - in this case, instead of using a left join, try using a subquery: WHERE ... AND s.mid NOT IN (SELECT mid FROM ptsContestExclude) - or - change your index around, from UNIQUE (cid,mid) to UNIQUE (mid,cid) due to the way MySQL uses indices you need the queried-upon column(s) listed first(earlier) in the index. These might speed things up HTH, Dan On 1/22/07, altendew [EMAIL PROTECTED] wrote: --- EXPLAIN --- 1 SIMPLE e system cid NULL NULL NULL 0 const row not found 1 SIMPLE s ref sid,status,timeframe status 4 const 20438 Using where; Using temporary; Using filesort 1 SIMPLE m eq_ref PRIMARY,status PRIMARY 4 sc_72.s.mid 1 Using where --- members --- id int(20) No first varchar(255) No last varchar(255) No username varchar(25) No email varchar(255) No password varchar(25) No PRIMARY id --- ptsContestExclude --- cid int(20) No 0 mid int(20) No 0 UNIQUE (cid,mid) --- ptsSignups (all indexes) --- PRIMARY id INDEX (mid,ptsID) INDEX (status,ptsID) INDEX timeframe (mid, status, time) Dan Buettner-2 wrote: Andrew, can you post the result of EXPLAIN query for your query? Minus the FORCE INDEX too. Also the structure of the other 2 tables would be helpful as well. Thanks, Dan On 1/22/07, altendew [EMAIL PROTECTED] wrote: --- ptsSignups -- id int(20) No mid int(20) No 0 ptsID int(20) No 0 pps double No 0 points double No 0 loginID varchar(255) No emailConfirm longtext No time timestamp Yes CURRENT_TIMESTAMP reasonForDeny longtext No status int(1) No 1 --- index (timeframe) --- timeframe (mid,status,time) --- query --- SELECT SUM(s.pps) as earned,m.id,m.username FROM ptsSignups s FORCE INDEX(timeframe) JOIN members m ON s.mid=m.id AND m.status='Member' LEFT JOIN ptsContestExclude e ON e.cid=1 AND e.mid=m.id WHERE s.status='2' AND s.time=2004-06-08 AND s.time2008-06-08+INTERVAL 1 DAY AND e.mid IS NULL GROUP BY s.mid HAVING earned0 ORDER BY earned DESC --- problem --- `ptsSignups` is a table listing everything my members have completed. Sometimes I like to run contests to see who has earned the most. `members` is a table that contains all my users. `ptsContestExclude` is a table of members of whom I would like to exclude from the contest. What I do first is group the table `ptsSignups` by member id, and calculate a sum of how much they earned. Then I reorder that sum in Descending order so the highest earned is on top. This `ptsSignups` table contains 82752 rows and is 75KB big. It runs extremely slow. I tried to create an index for it but it failed to increase performance. Any help is appreciated. -- View this message in context: http://www.nabble.com/Help-indexing-this-query.-tf3059045.html#a8505554 Sent from the MySQL - General mailing list archive at Nabble.com. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- View this message in context: http://www.nabble.com/Help-indexing-this-query.-tf3059045.html#a8505966 Sent from the MySQL - General mailing list archive at Nabble.com. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Help indexing this query.
Thanks for the casting tip. Dan Buettner-2 wrote: Andrew, couple of suggestions: 1 - where you use s.status='2' change it to s.status=2 otherwise MySQL is likely casting your data from int to string, which is slow and also precludes using an index. 2 - in this case, instead of using a left join, try using a subquery: WHERE ... AND s.mid NOT IN (SELECT mid FROM ptsContestExclude) - or - change your index around, from UNIQUE (cid,mid) to UNIQUE (mid,cid) due to the way MySQL uses indices you need the queried-upon column(s) listed first(earlier) in the index. These might speed things up HTH, Dan On 1/22/07, altendew [EMAIL PROTECTED] wrote: --- EXPLAIN --- 1 SIMPLE e system cid NULL NULL NULL 0 const row not found 1 SIMPLE s ref sid,status,timeframe status 4 const 20438 Using where; Using temporary; Using filesort 1 SIMPLE m eq_ref PRIMARY,status PRIMARY 4 sc_72.s.mid 1 Using where --- members --- id int(20) No first varchar(255) No last varchar(255) No username varchar(25) No email varchar(255) No password varchar(25) No PRIMARY id --- ptsContestExclude --- cid int(20) No 0 mid int(20) No 0 UNIQUE (cid,mid) --- ptsSignups (all indexes) --- PRIMARY id INDEX (mid,ptsID) INDEX (status,ptsID) INDEX timeframe (mid, status, time) Dan Buettner-2 wrote: Andrew, can you post the result of EXPLAIN query for your query? Minus the FORCE INDEX too. Also the structure of the other 2 tables would be helpful as well. Thanks, Dan On 1/22/07, altendew [EMAIL PROTECTED] wrote: --- ptsSignups -- id int(20) No mid int(20) No 0 ptsID int(20) No 0 pps double No 0 points double No 0 loginID varchar(255) No emailConfirm longtext No time timestamp Yes CURRENT_TIMESTAMP reasonForDeny longtext No status int(1) No 1 --- index (timeframe) --- timeframe (mid,status,time) --- query --- SELECT SUM(s.pps) as earned,m.id,m.username FROM ptsSignups s FORCE INDEX(timeframe) JOIN members m ON s.mid=m.id AND m.status='Member' LEFT JOIN ptsContestExclude e ON e.cid=1 AND e.mid=m.id WHERE s.status='2' AND s.time=2004-06-08 AND s.time2008-06-08+INTERVAL 1 DAY AND e.mid IS NULL GROUP BY s.mid HAVING earned0 ORDER BY earned DESC --- problem --- `ptsSignups` is a table listing everything my members have completed. Sometimes I like to run contests to see who has earned the most. `members` is a table that contains all my users. `ptsContestExclude` is a table of members of whom I would like to exclude from the contest. What I do first is group the table `ptsSignups` by member id, and calculate a sum of how much they earned. Then I reorder that sum in Descending order so the highest earned is on top. This `ptsSignups` table contains 82752 rows and is 75KB big. It runs extremely slow. I tried to create an index for it but it failed to increase performance. Any help is appreciated. -- View this message in context: http://www.nabble.com/Help-indexing-this-query.-tf3059045.html#a8505554 Sent from the MySQL - General mailing list archive at Nabble.com. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- View this message in context: http://www.nabble.com/Help-indexing-this-query.-tf3059045.html#a8505966 Sent from the MySQL - General mailing list archive at Nabble.com. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- View this message in context: http://www.nabble.com/Help-indexing-this-query.-tf3059045.html#a8513291 Sent from the MySQL - General mailing list archive at Nabble.com. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Indexing issue in slave !!
Hi, I am facing a peculiar problem.When i execute a query on slave server it doesn't use indexes that have been created and hence read all the records present in that table.The same query when execute on Master yields proper results.The table structure , table types are same.What could be wrong? regards Himanshu Raina -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Indexing issue in slave !!
In the last episode (Jan 06), Himanshu Raina said: I am facing a peculiar problem.When i execute a query on slave server it doesn't use indexes that have been created and hence read all the records present in that table.The same query when execute on Master yields proper results.The table structure , table types are same.What could be wrong? If you compare the EXPLAIN SELECT .. outputs for the query on both servers, what's different? -- Dan Nelson [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Foreign Language FULLTEXT Indexing
I'm guessing that the way MySQL handles foreign language full text indexing is through parser plug-ins and custom stop word lists. Am I right? And If so, these must have been already created for the common western languages such as German, French etc. Where can I find these plug-ins? Is there any MySQL or third party documentation that anyone knows of that can help me out. I'm evaluating MySQL for an application doing full text indexing and searching of English and many different foreign language documents. Catherine Macheret
Re: example when indexing hurts simple select?
Hi Gasper, MySql allows to package the index - to get its size smaller and to gain performance. Some information about that can be found here: http://www.mysqlperformanceblog.com/2006/05/13/to-pack-or-not-to-pack-myisam-key-compression/ Gaspar Bakos schrieb: Hi, RE: Have you tried analyze table x; This was quick: mysql analyze table TEST; Table Op Msg_typeMsg_text CAT.TEST analyze status Table is already up to date -- mysql show index from TEST; +---+++--+-+---+-+--++--++-+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---+++--+-+---+-+--++--++-+ | TEST | 1 | MMtestfiel |1 | MMtestfiel | A | 838 | NULL | NULL | | BTREE | NULL| | TEST | 1 | MMi_m |1 | MMi_m | A | 25857 | NULL | NULL | YES | BTREE | NULL| +---+++--+-+---+-+--++--++-+ --- I am trying to figure out what the Packed field means. Gaspar smime.p7s Description: S/MIME Cryptographic Signature
example when indexing hurts simple select?
Hello, There is a table (TEST) with ~100 million records, 70 columns (mostly integers, some doubles, and a few short fixed char()), and has a ~100Gb size. The table has a single (not unique) index on one integer column: MMi. If I invoke a simple select based on MMi, then the selection is VERY slow: nohup time mysql CAT -u catadmin -p$MPWD -e create table test2 select * from TEST where MMi 9000; ( this selects only ~0.5 % of the table, by the way, so test2 is a small table, and the time is not spent with writing it on disk) Time used: 47 minutes: 0.00user 0.00system 47:17.37elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+416minor)pagefaults 0swaps If I do the same but ignore the index, the select time drops to 1/5th !!! nohup time mysql CAT -u catadmin -p$MPWD -e create table test3 \ select * from TEST ignore index (MMi) where \ MMi 9000; Time used: 11 minutes: 0.00user 0.00system 11:08.23elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+415minor)pagefaults 0swaps Without the index, MySQL does a simple thing; it reads in sequentially the 100Gb database, and while reading, it parses the lines, and determines if the MMi is 9000. This is done with about 16Mb/s speed. With the index, it performs a large number of random seeks. The data (.MYD) is probably not organized on the disk according to sorted MMi. Questions; 1. Is there a way to decrease random seeks? E.g. mysqld config parameters, increase some buffer/cache sizes? 2. Optimize table: is there a way to rearrange data so that random seeks are minimized? 3. If we have to live with large number of random seeks does anyone know how the underlying FS can be optimized? Cheers, Gaspar ( All this TEST DB is kept on FC3 Redhat + MySQL 5.0.22, XFS filesystem on a 1.5Tb RAID-5 array of 4 SATA disks (served by a 3ware RAID controller). The computer is running on two opteron 2.0GHZ CPUs and 4Gb RAM. ) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: example when indexing hurts simple select?
Questions; 1. Is there a way to decrease random seeks? E.g. mysqld config parameters, increase some buffer/cache sizes? 2. Optimize table: is there a way to rearrange data so that random seeks are minimized? 3. If we have to live with large number of random seeks does anyone know how the underlying FS can be optimized? What is the EXPLAIN output of each? Have you tried analyze table x; optimize table x; On a 100GB, these might table a while. You probably want to LOCK the table before running them. Is it MyISAM or Innodb ? -- Philip M. Gollucci ([EMAIL PROTECTED]) 323.219.4708 Consultant / http://p6m7g8.net/Resume/resume.shtml Senior Software Engineer - TicketMaster - http://ticketmaster.com 1024D/A79997FA F357 0FDD 2301 6296 690F 6A47 D55A 7172 A799 97F It takes a minute to have a crush on someone, an hour to like someone, and a day to love someone, but it takes a lifetime to forget someone... -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: example when indexing hurts simple select?
Hi, Philip, RE: What is the EXPLAIN output of each? OK, first I naively typed: explain create table test2 select * from TEST where MMi 9000; but of course, this does not work. The simple select that uses MMi_m as index (and takes up to an hour): mysql explain select * from TEST where MMi_m 9000; ++-+---+---+---+---+-+--++-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+---+---+---+-+--++-+ | 1 | SIMPLE | TEST | range | MMi_m | MMi_m | 3 | NULL | 406649 | Using where | ++-+---+---+---+---+-+--++-+ The select with ignoreing the index (takes only 11 minutes) mysql explain select * from TEST ignore key (MMi_m) where MMi_m 9000; ++-+---+--+---+--+-+--+---+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+---+--+---+--+-+--+---+-+ | 1 | SIMPLE | TEST | ALL | NULL | NULL | NULL| NULL | 470992970 | Using where | ++-+---+--+---+--+-+--+---+-+ Have you tried analyze table x; optimize table x; Not yet. As regards optimize table, I thought it would not make too much sense, because: OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). But I have just uploaded this table from ASCII, and made no changes. Nevertheless, I will give a try, maybe there is some feature of OPTIMIZE TABLE I don't know of. What did you think of? Is it MyISAM or Innodb ? MyISAM. I'll keep you posted. I am very curious about how this can be resolved. Cheers, Gaspar -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: need white papers on performace tuning of full text indexing
A simple search on google for mysql fulltext indexing provided many links, including: http://jeremy.zawodny.com/blog/archives/000576.html http://epsilondelta.wordpress.com/2006/02/08/dissecting-mysql-fulltext-indexing/ (overviews of how it works) and http://dev.mysql.com/doc/refman/5.0/en/fulltext-fine-tuning.html (fine tuning full text search) Sincerely, Sheeri On 2/23/06, Anand Sachdev [EMAIL PROTECTED] wrote: anyone know where i can get these, will highly appreciate, this is a feature of mysql 5.0 and my platform is linux. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
need white papers on performace tuning of full text indexing
anyone know where i can get these, will highly appreciate, this is a feature of mysql 5.0 and my platform is linux.
Re: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode
Mohsen wrote: But himself solved his problem. with : mysql_query(SET NAMES utf8); Even 4.0.x Wrong. I decided to prepare two different versions for my software: - A MySQL 4.0-friendly version using Romanizing method (Hats off to you, Ehsan) - A MySQL 4.1-compatible version. The code you mentioned belongs to the 2nd version. SET NAMES indicates what is in the SQL statements that the client sends. Thus, SET NAMES 'cp1251' tells the server future incoming messages from this client are in character set cp1251. It also specifies the character set for results that the server sends back to the client. (For example, it indicates what character set column values are if you use a SELECT statement.) MySQL Manual 4.1 - 10.3.6. Connection Character Sets and Collations. Kind Regards, Behzad - Yahoo! Music Unlimited - Access over 1 million songs. Try it free.