ANN: Hopper (stored procedure debugger), version 1.0.1 released
ANN: Hopper, version 1.0.1 released Dear ladies and gentlemen, Upscene Productions is proud to announce version 1 of a new product called Hopper. Hopper is a Stored Routine and Trigger Debugger, available for InterBase, Firebird and MySQL. For more information, see http://www.upscene.com/displaynews.php?item=20120620 With regards, Martijn Tonies Upscene Productions http://www.upscene.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Need Query Help
Hi All, I need query help for following table struture, where we need to calculate login duration of that employee for give period. Example table EmployeeID LoginTime LogoutTIme 101 2012-05-01 10:00:00 2012-05-01 12:30:00 102 2012-04-31 23:00:00 2012-05-02 05:00:00 103 2012-05-01 14:00:00 NULL 104 2012-05-02 00:10:00 2012-05-02 05:00:00 I tried to fit all scenario in above table, Consider NULL as yet to logout How would i calcuate Employee and it Login duration for period say from 2012-05-01 08:00:00 to 2012-05-01 22:00:00 --Anupam
Indexing about 40 Billion Entries
Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. The status for this table is the following: Name: l4_link Engine: MyISAM Version: 10 Row_format: Fixed Rows: 39806500262 Avg_row_length: 17 Data_length: 676710504454 Max_data_length: 4785074604081151 Index_length: 1024 Data_free: 0 Auto_increment: NULL Create_time: 2012-06-19 14:51:29 Update_time: 2012-06-19 16:26:35 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: The variables for myisam are the following: mysql show global variables like '%myisam%'; ++---+ | Variable_name | Value | ++---+ | myisam_data_pointer_size | 6 | | myisam_max_sort_file_size | 9223372036853727232 | | myisam_mmap_size | 18446744073709551615| | myisam_recover_options | BACKUP | | myisam_repair_threads | 1 | | myisam_sort_buffer_size| 8388608 | | myisam_stats_method| nulls_unequal | | myisam_use_mmap | OFF | +---++ 8 rows in set (0.00 sec) The temp folder has about 16tb free space. When I start the indexing process, it copies the 640Gb into a temp file and then starts with repair with keycache. On the internet I found that if it says repair with keycache you shold increase the myisam_max_sort_file_size, but this didn't work. It still says repair with keycache after it copied the data to the temp dir. I hope you have any idea how to fix this. Thanks in advance. Christian -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Indexing about 40 Billion Entries
looks like the value that you give for myisam_max_sort_size is not enough for the index creation and hence it doing a REPAIR WITH KEYCACHE Use the below query to set the min values required for myisam_max_sort_size to avoid repair with keycache select a.index_name as index_name, ceil( (select count(*) from `db_name`.`table_name`) *( @@global.myisam_data_pointer_size + sum(ifnull(1 + `a`.`sub_part` * (`b`.`character_octet_length`/`b`.`character_maximum_length`), ifnull(1 + `b`.`character_octet_length`, case when `b`.`data_type` = 'tinyint' then 1 when `b`.`data_type` = 'smallint' then 2 when `b`.`data_type` = 'mediumint' then 3 when `b`.`data_type` = 'int' then 4 when `b`.`data_type` = 'bigint' then 8 when `b`.`data_type` = 'float' then 4 when `b`.`data_type` = 'double' then 8 when `b`.`data_type` = 'real' then 8 when `b`.`data_type` = 'bit' then 8 when `b`.`data_type` = 'date' then 3 when `b`.`data_type` = 'datetime' then 8 when `b`.`data_type` = 'timestamp' then 4 when `b`.`data_type` = 'time' then 3 when `b`.`data_type` = 'year' then 1 when `b`.`data_type` = 'enum' then 2 when `b`.`data_type` = 'set' then 8 when `b`.`data_type` = 'decimal' then 8 end ) ) + if(`a`.`nullable`='YES',1,0) + if(`b`.`character_octet_length` =255,2,0) ))/1048576)*1048576 as `index_size` from `information_schema`.`statistics` `a`, `information_schema`.`columns` `b` where `a`.`table_name`=`b`.`table_name` and `a`.`table_schema`=`b`.`table_schema` and `a`.`column_name`=`b`.`column_name` and `a`.`table_schema`='db_name' and `a`.`table_name`='table_name' group by `a`.`index_name` order by `index_size` desc limit 1; On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch ckoetteritz...@e-humanities.net wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. The status for this table is the following: Name: l4_link Engine: MyISAM Version: 10 Row_format: Fixed Rows: 39806500262 Avg_row_length: 17 Data_length: 676710504454 Max_data_length: 4785074604081151 Index_length: 1024 Data_free: 0 Auto_increment: NULL Create_time: 2012-06-19 14:51:29 Update_time: 2012-06-19 16:26:35 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: The variables for myisam are the following: mysql show global variables like '%myisam%'; +-**---+--** -+ | Variable_name | Value | +-**---+--** -+ | myisam_data_pointer_size | 6 | | myisam_max_sort_file_size | 9223372036853727232 | | myisam_mmap_size | 18446744073709551615| | myisam_recover_options | BACKUP | | myisam_repair_threads | 1 | | myisam_sort_buffer_size| 8388608 | | myisam_stats_method| nulls_unequal | | myisam_use_mmap | OFF | +-**--+---** -+ 8 rows in set (0.00 sec) The temp folder has about 16tb free space. When I start the indexing process, it copies the 640Gb into a temp file and then starts with repair with keycache. On the internet I found that if it says repair with keycache you shold increase the myisam_max_sort_file_size, but this didn't work. It still says repair with keycache after it copied the data to the temp dir. I hope you have any idea how to fix this. Thanks in advance. Christian -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: Need Query Help
http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_timediff and SEC_TO_TIME()/3600 -Original Message- From: Anupam Karmarkar [mailto:sb_akarmar...@yahoo.com] Sent: Wednesday, June 20, 2012 2:39 AM To: mysql@lists.mysql.com Subject: Need Query Help Hi All, I need query help for following table struture, where we need to calculate login duration of that employee for give period. Example table EmployeeID LoginTime LogoutTIme 101 2012-05-01 10:00:00 2012-05-01 12:30:00 102 2012-04-31 23:00:00 2012-05-02 05:00:00 103 2012-05-01 14:00:00 NULL 104 2012-05-02 00:10:00 2012-05-02 05:00:00 I tried to fit all scenario in above table, Consider NULL as yet to logout How would i calcuate Employee and it Login duration for period say from 2012-05-01 08:00:00 to 2012-05-01 22:00:00 --Anupam -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: Indexing about 40 Billion Entries
Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still take days, maybe weeks. I strongly recommend you do not try to access that many rows directly. Instead, build summary tables, and access them. We can discuss further. Unfortunately, you did not even include a PRIMARY KEY when you built the table. This makes any operations slow. I need an index for ruid1 and for ruid2. What will your SELECTs look like? You may be better off with INDEX(ruid1, ruid2, overlap), and INDEX(ruid2, ruid1, overlap) Will you be adding more rows to this table? Or is it now static? What are the semantics of the fields? -Original Message- From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Wednesday, June 20, 2012 4:37 AM To: Christian Koetteritzsch Cc: mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries looks like the value that you give for myisam_max_sort_size is not enough for the index creation and hence it doing a REPAIR WITH KEYCACHE Use the below query to set the min values required for myisam_max_sort_size to avoid repair with keycache select a.index_name as index_name, ceil( (select count(*) from `db_name`.`table_name`) *( @@global.myisam_data_pointer_size + sum(ifnull(1 + `a`.`sub_part` * (`b`.`character_octet_length`/`b`.`character_maximum_length`), ifnull(1 + `b`.`character_octet_length`, case when `b`.`data_type` = 'tinyint' then 1 when `b`.`data_type` = 'smallint' then 2 when `b`.`data_type` = 'mediumint' then 3 when `b`.`data_type` = 'int' then 4 when `b`.`data_type` = 'bigint' then 8 when `b`.`data_type` = 'float' then 4 when `b`.`data_type` = 'double' then 8 when `b`.`data_type` = 'real' then 8 when `b`.`data_type` = 'bit' then 8 when `b`.`data_type` = 'date' then 3 when `b`.`data_type` = 'datetime' then 8 when `b`.`data_type` = 'timestamp' then 4 when `b`.`data_type` = 'time' then 3 when `b`.`data_type` = 'year' then 1 when `b`.`data_type` = 'enum' then 2 when `b`.`data_type` = 'set' then 8 when `b`.`data_type` = 'decimal' then 8 end ) ) + if(`a`.`nullable`='YES',1,0) + if(`b`.`character_octet_length` =255,2,0) ))/1048576)*1048576 as `index_size` from `information_schema`.`statistics` `a`, `information_schema`.`columns` `b` where `a`.`table_name`=`b`.`table_name` and `a`.`table_schema`=`b`.`table_schema` and `a`.`column_name`=`b`.`column_name` and `a`.`table_schema`='db_name' and `a`.`table_name`='table_name' group by `a`.`index_name` order by `index_size` desc limit 1; On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch ckoetteritz...@e-humanities.net wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. The status for this table is the following: Name: l4_link Engine: MyISAM Version: 10 Row_format: Fixed Rows: 39806500262 Avg_row_length: 17 Data_length: 676710504454 Max_data_length: 4785074604081151 Index_length: 1024 Data_free: 0 Auto_increment: NULL Create_time: 2012-06-19 14:51:29 Update_time: 2012-06-19 16:26:35 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: The variables for myisam are the following: mysql show global variables like '%myisam%'; +-**---+--** -+ | Variable_name | Value | +-**---+--** -+ | myisam_data_pointer_size | 6 | | myisam_max_sort_file_size | 9223372036853727232 | | myisam_mmap_size | 18446744073709551615| | myisam_recover_options | BACKUP | | myisam_repair_threads | 1 | | myisam_sort_buffer_size| 8388608 | | myisam_stats_method|
Re: Indexing about 40 Billion Entries
Thanks for the information. It is no problem if it takes days or weeks, because the server is specially for such tasks that takes time and uses lots of resources. Am 20.06.2012 19:55, schrieb Rick James: Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still take days, maybe weeks. I strongly recommend you do not try to access that many rows directly. Instead, build summary tables, and access them. We can discuss further. Did you mean that I make several partitions with for example 1 Billion Entries and than make a union of the results from the partitions? Unfortunately, you did not even include a PRIMARY KEY when you built the table. This makes any operations slow. There is no PRIMARY KEY because ruid1 and ruid2 are not unique values. I need an index for ruid1 and for ruid2. What will your SELECTs look like? You may be better off with INDEX(ruid1, ruid2, overlap), and INDEX(ruid2, ruid1, overlap) Will you be adding more rows to this table? Or is it now static? No data will be added to this table so it is a static table. And the index, as i have planed them will be INDEX(ruid1), and INDEX(ruid2). My SELECT looks like this: SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx and xxx are the same int values What are the semantics of the fields? The semantic behind these fields are, that ruid1 and ruid2 are ID's for sentences that have common words and overlap is the number of words they have in common. I hope the informations I gave you are helpful. If you have more questions than you can ask them. Kind regards Christian -Original Message- From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Wednesday, June 20, 2012 4:37 AM To: Christian Koetteritzsch Cc:mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries looks like the value that you give for myisam_max_sort_size is not enough for the index creation and hence it doing a REPAIR WITH KEYCACHE Use the below query to set the min values required for myisam_max_sort_size to avoid repair with keycache select a.index_name as index_name, ceil( (select count(*) from `db_name`.`table_name`) *( @@global.myisam_data_pointer_size + sum(ifnull(1 + `a`.`sub_part` * (`b`.`character_octet_length`/`b`.`character_maximum_length`), ifnull(1 + `b`.`character_octet_length`, case when `b`.`data_type` = 'tinyint' then 1 when `b`.`data_type` = 'smallint' then 2 when `b`.`data_type` = 'mediumint' then 3 when `b`.`data_type` = 'int' then 4 when `b`.`data_type` = 'bigint' then 8 when `b`.`data_type` = 'float' then 4 when `b`.`data_type` = 'double' then 8 when `b`.`data_type` = 'real' then 8 when `b`.`data_type` = 'bit' then 8 when `b`.`data_type` = 'date' then 3 when `b`.`data_type` = 'datetime' then 8 when `b`.`data_type` = 'timestamp' then 4 when `b`.`data_type` = 'time' then 3 when `b`.`data_type` = 'year' then 1 when `b`.`data_type` = 'enum' then 2 when `b`.`data_type` = 'set' then 8 when `b`.`data_type` = 'decimal' then 8 end ) ) + if(`a`.`nullable`='YES',1,0) + if(`b`.`character_octet_length` =255,2,0) ))/1048576)*1048576 as `index_size` from `information_schema`.`statistics` `a`, `information_schema`.`columns` `b` where `a`.`table_name`=`b`.`table_name` and `a`.`table_schema`=`b`.`table_schema` and `a`.`column_name`=`b`.`column_name` and `a`.`table_schema`='db_name' and `a`.`table_name`='table_name' group by `a`.`index_name` order by `index_size` desc limit 1; On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch ckoetteritz...@e-humanities.net wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. The status for this table is the following: Name: l4_link Engine: MyISAM Version: 10 Row_format: Fixed Rows: 39806500262 Avg_row_length: 17 Data_length: 676710504454 Max_data_length: 4785074604081151 Index_length: 1024
Re: Indexing about 40 Billion Entries
sorry I forget the ORDER BY in the SELECT statement so the correct SELECT statement is: SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx ORDER BY overlap DESC Am 20.06.2012 22:42, schrieb Christian Koetteritzsch: Thanks for the information. It is no problem if it takes days or weeks, because the server is specially for such tasks that takes time and uses lots of resources. Am 20.06.2012 19:55, schrieb Rick James: Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still take days, maybe weeks. I strongly recommend you do not try to access that many rows directly. Instead, build summary tables, and access them. We can discuss further. Did you mean that I make several partitions with for example 1 Billion Entries and than make a union of the results from the partitions? Unfortunately, you did not even include a PRIMARY KEY when you built the table. This makes any operations slow. There is no PRIMARY KEY because ruid1 and ruid2 are not unique values. I need an index for ruid1 and for ruid2. What will your SELECTs look like? You may be better off with INDEX(ruid1, ruid2, overlap), and INDEX(ruid2, ruid1, overlap) Will you be adding more rows to this table? Or is it now static? No data will be added to this table so it is a static table. And the index, as i have planed them will be INDEX(ruid1), and INDEX(ruid2). My SELECT looks like this: SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx and xxx are the same int values What are the semantics of the fields? The semantic behind these fields are, that ruid1 and ruid2 are ID's for sentences that have common words and overlap is the number of words they have in common. I hope the informations I gave you are helpful. If you have more questions than you can ask them. Kind regards Christian -Original Message- From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Wednesday, June 20, 2012 4:37 AM To: Christian Koetteritzsch Cc:mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries looks like the value that you give for myisam_max_sort_size is not enough for the index creation and hence it doing a REPAIR WITH KEYCACHE Use the below query to set the min values required for myisam_max_sort_size to avoid repair with keycache select a.index_name as index_name, ceil( (select count(*) from `db_name`.`table_name`) *( @@global.myisam_data_pointer_size + sum(ifnull(1 + `a`.`sub_part` * (`b`.`character_octet_length`/`b`.`character_maximum_length`), ifnull(1 + `b`.`character_octet_length`, case when `b`.`data_type` = 'tinyint' then 1 when `b`.`data_type` = 'smallint' then 2 when `b`.`data_type` = 'mediumint' then 3 when `b`.`data_type` = 'int' then 4 when `b`.`data_type` = 'bigint' then 8 when `b`.`data_type` = 'float' then 4 when `b`.`data_type` = 'double' then 8 when `b`.`data_type` = 'real' then 8 when `b`.`data_type` = 'bit' then 8 when `b`.`data_type` = 'date' then 3 when `b`.`data_type` = 'datetime' then 8 when `b`.`data_type` = 'timestamp' then 4 when `b`.`data_type` = 'time' then 3 when `b`.`data_type` = 'year' then 1 when `b`.`data_type` = 'enum' then 2 when `b`.`data_type` = 'set' then 8 when `b`.`data_type` = 'decimal' then 8 end ) ) + if(`a`.`nullable`='YES',1,0) + if(`b`.`character_octet_length` =255,2,0) ))/1048576)*1048576 as `index_size` from `information_schema`.`statistics` `a`, `information_schema`.`columns` `b` where `a`.`table_name`=`b`.`table_name` and `a`.`table_schema`=`b`.`table_schema` and `a`.`column_name`=`b`.`column_name` and `a`.`table_schema`='db_name' and `a`.`table_name`='table_name' group by `a`.`index_name` order by `index_size` desc limit 1; On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch ckoetteritz...@e-humanities.net wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. The status for this table
Re: Indexing about 40 Billion Entries
On 6/20/2012 5:45 AM, Christian Koetteritzsch wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. Actually, based on your proposed query, I believe you want an index on (ruid1, ruid2) not separate indexes for each column. The status for this table is the following: Name: l4_link Engine: MyISAM Version: 10 Row_format: Fixed Rows: 39806500262 Avg_row_length: 17 Data_length: 676710504454 Max_data_length: 4785074604081151 Index_length: 1024 Data_free: 0 Auto_increment: NULL Create_time: 2012-06-19 14:51:29 Update_time: 2012-06-19 16:26:35 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: The variables for myisam are the following: mysql show global variables like '%myisam%'; ++---+ | Variable_name | Value | ++---+ | myisam_data_pointer_size | 6 | | myisam_max_sort_file_size | 9223372036853727232 | | myisam_mmap_size | 18446744073709551615| | myisam_recover_options | BACKUP | | myisam_repair_threads | 1 | | myisam_sort_buffer_size| 8388608 | | myisam_stats_method| nulls_unequal | | myisam_use_mmap | OFF | +---++ 8 rows in set (0.00 sec) The temp folder has about 16tb free space. When I start the indexing process, it copies the 640Gb into a temp file and then starts with repair with keycache. On the internet I found that if it says repair with keycache you shold increase the myisam_max_sort_file_size, but this didn't work. It still says repair with keycache after it copied the data to the temp dir. I hope you have any idea how to fix this. Try maxing out the following settings: myisam_sort_buffer_size key_buffer_size http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_myisam_sort_buffer_size http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_myisam_sort_buffer_size You will need to experiment with how large the maximum value will be permitted for your platform. -- Shawn Green MySQL Principal Technical Support Engineer Oracle USA, Inc. - Hardware and Software, Engineered to Work Together. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: Indexing about 40 Billion Entries
(ruid1, ruid2) will help for AND, but not at all for OR. -Original Message- From: Shawn Green [mailto:shawn.l.gr...@oracle.com] Sent: Wednesday, June 20, 2012 2:30 PM To: mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries On 6/20/2012 5:45 AM, Christian Koetteritzsch wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB DROP TABLE IF EXISTS `l4_link`; CREATE TABLE `l4_link` ( `ruid1` int NOT NULL, `ruid2` int NOT NULL, `overlap` int NOT NULL ); I need an index for ruid1 and for ruid2. Actually, based on your proposed query, I believe you want an index on (ruid1, ruid2) not separate indexes for each column. The status for this table is the following: Name: l4_link Engine: MyISAM Version: 10 Row_format: Fixed Rows: 39806500262 Avg_row_length: 17 Data_length: 676710504454 Max_data_length: 4785074604081151 Index_length: 1024 Data_free: 0 Auto_increment: NULL Create_time: 2012-06-19 14:51:29 Update_time: 2012-06-19 16:26:35 Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: The variables for myisam are the following: mysql show global variables like '%myisam%'; ++--- + | Variable_name | Value | ++--- + | myisam_data_pointer_size | 6 | | myisam_max_sort_file_size | 9223372036853727232 | | myisam_mmap_size | 18446744073709551615| | myisam_recover_options | BACKUP | | myisam_repair_threads | 1 | | myisam_sort_buffer_size| 8388608 | | myisam_stats_method| nulls_unequal | | myisam_use_mmap | OFF | +---+ + 8 rows in set (0.00 sec) The temp folder has about 16tb free space. When I start the indexing process, it copies the 640Gb into a temp file and then starts with repair with keycache. On the internet I found that if it says repair with keycache you shold increase the myisam_max_sort_file_size, but this didn't work. It still says repair with keycache after it copied the data to the temp dir. I hope you have any idea how to fix this. Try maxing out the following settings: myisam_sort_buffer_size key_buffer_size http://dev.mysql.com/doc/refman/5.5/en/server-system- variables.html#sysvar_myisam_sort_buffer_size http://dev.mysql.com/doc/refman/5.5/en/server-system- variables.html#sysvar_myisam_sort_buffer_size You will need to experiment with how large the maximum value will be permitted for your platform. -- Shawn Green MySQL Principal Technical Support Engineer Oracle USA, Inc. - Hardware and Software, Engineered to Work Together. Office: Blountville, TN -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: Indexing about 40 Billion Entries
SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx ORDER BY overlap DESC MySQL does not optimize that kind of OR well. This will run _much_ faster (with your two indexes): ( SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx UNION SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx ) ORDER BY overlap DESC; Make it UNION ALL or UNION DISTINCT depending on whether xxx can be in both fields of one row. UNION DISTINCT makes a pass over the temp table to dedup. Your version (OR) will do a table scan. My version will: 1. scan ruid1 index (.MYI) (1+ disk hits for index range scan) 2. for each matching ruid1, fetch the data row (in .MYD) (1 hit/row); write to tmp table 3,4. ditto for ruid2, appending to same tmp table 5. sort tmp table (ORDER BY) (probably 0 disk hits, due to using MEMORY) Once the indexes are in place... Depending on the version you are using, key_buffer_size is limited to 4G or is not. For your huge machine, 24G might be wise. The key_buffer will help steps 1,3. The rest of RAM will be available for the OS to cache the data blocks (2,4). The 16 cores -- A single query (even with the UNION I suggested) will use only one core. How many rows (average, max) do you expect from SELECT ... FROM l4_link WHERE ruid1=xxx I ask, because that might run as slow as 100 rows/sec., simply because of the disk hits. PARTITIONing, per se, does not help performance. There are only a few use cases where PARTITION shines. (I have not seen such [yet] in your application.) -Original Message- From: Christian Koetteritzsch [mailto:ckoetteritz...@e-humanities.net] Sent: Wednesday, June 20, 2012 1:42 PM To: Rick James Cc: Ananda Kumar; mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries Thanks for the information. It is no problem if it takes days or weeks, because the server is specially for such tasks that takes time and uses lots of resources. Am 20.06.2012 19:55, schrieb Rick James: Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still take days, maybe weeks. I strongly recommend you do not try to access that many rows directly. Instead, build summary tables, and access them. We can discuss further. Did you mean that I make several partitions with for example 1 Billion Entries and than make a union of the results from the partitions? Unfortunately, you did not even include a PRIMARY KEY when you built the table. This makes any operations slow. There is no PRIMARY KEY because ruid1 and ruid2 are not unique values. I need an index for ruid1 and for ruid2. What will your SELECTs look like? You may be better off with INDEX(ruid1, ruid2, overlap), and INDEX(ruid2, ruid1, overlap) Will you be adding more rows to this table? Or is it now static? No data will be added to this table so it is a static table. And the index, as i have planed them will be INDEX(ruid1), and INDEX(ruid2). My SELECT looks like this: SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx and xxx are the same int values What are the semantics of the fields? The semantic behind these fields are, that ruid1 and ruid2 are ID's for sentences that have common words and overlap is the number of words they have in common. I hope the informations I gave you are helpful. If you have more questions than you can ask them. Kind regards Christian -Original Message- From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Wednesday, June 20, 2012 4:37 AM To: Christian Koetteritzsch Cc:mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries looks like the value that you give for myisam_max_sort_size is not enough for the index creation and hence it doing a REPAIR WITH KEYCACHE Use the below query to set the min values required for myisam_max_sort_size to avoid repair with keycache select a.index_name as index_name, ceil( (select count(*) from `db_name`.`table_name`) *( @@global.myisam_data_pointer_size + sum(ifnull(1 + `a`.`sub_part` * (`b`.`character_octet_length`/`b`.`character_maximum_length`), ifnull(1 + `b`.`character_octet_length`, case when `b`.`data_type` = 'tinyint' then 1 when `b`.`data_type` = 'smallint' then 2 when `b`.`data_type` = 'mediumint' then 3 when `b`.`data_type` = 'int' then 4 when `b`.`data_type` = 'bigint' then 8 when `b`.`data_type` = 'float' then 4 when `b`.`data_type` = 'double' then 8 when `b`.`data_type` = 'real' then 8 when `b`.`data_type` = 'bit' then 8