ANN: Hopper (stored procedure debugger), version 1.0.1 released

2012-06-20 Thread Martijn Tonies

ANN: Hopper, version 1.0.1 released



Dear ladies and gentlemen,

Upscene Productions is proud to announce version 1 of a new
product called Hopper.

Hopper is a Stored Routine and Trigger Debugger, available for
InterBase, Firebird and MySQL.



For more information, see 
http://www.upscene.com/displaynews.php?item=20120620



With regards,

Martijn Tonies

Upscene Productions
http://www.upscene.com



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Need Query Help

2012-06-20 Thread Anupam Karmarkar
Hi All,

I need query help for following table struture, where we need to calculate 
login duration of that employee for give period.

Example table


EmployeeID     LoginTime   LogoutTIme

101             2012-05-01 10:00:00     2012-05-01 12:30:00
102             2012-04-31 23:00:00      2012-05-02 05:00:00  

103             2012-05-01 14:00:00  NULL    
104             2012-05-02 00:10:00  2012-05-02 05:00:00


I tried to fit all scenario in above table, Consider NULL as yet to logout


How would i calcuate Employee and it Login duration for period say from 
2012-05-01 08:00:00 to 2012-05-01 22:00:00


--Anupam


Indexing about 40 Billion Entries

2012-06-20 Thread Christian Koetteritzsch

Hi guys,

As the title says I'm trying to index 40 billion entries with two 
indexes on a server with 16 cores and 128GB RAM. The table is the one 
below and it is a myisam table. The *.myd file is about 640GB


DROP TABLE IF EXISTS `l4_link`;
CREATE TABLE  `l4_link` (
  `ruid1` int NOT NULL,
  `ruid2` int NOT NULL,
  `overlap` int NOT NULL
);

I need an index for ruid1 and for ruid2.

The status for this table is the following:

   Name: l4_link
 Engine: MyISAM
Version: 10
 Row_format: Fixed
   Rows: 39806500262
 Avg_row_length: 17
Data_length: 676710504454
Max_data_length: 4785074604081151
   Index_length: 1024
  Data_free: 0
 Auto_increment: NULL
Create_time: 2012-06-19 14:51:29
Update_time: 2012-06-19 16:26:35
 Check_time: NULL
  Collation: utf8_general_ci
   Checksum: NULL
 Create_options:
Comment:

The variables for myisam are the following:
mysql show global variables like '%myisam%';
++---+
| Variable_name   | 
Value  |

++---+
| myisam_data_pointer_size  | 6 |
| myisam_max_sort_file_size | 9223372036853727232  |
| myisam_mmap_size | 18446744073709551615|
| myisam_recover_options | BACKUP   |
| myisam_repair_threads   | 1 |
| myisam_sort_buffer_size| 8388608  |
| myisam_stats_method| nulls_unequal |
| myisam_use_mmap | 
OFF  |

+---++
8 rows in set (0.00 sec)

The temp folder has about 16tb free space.

When I start the indexing process, it copies the 640Gb into a temp file 
and then starts with repair with keycache.
On the internet I found that if it says repair with keycache you shold 
increase the myisam_max_sort_file_size, but this didn't work.
It still says repair with keycache after it copied the data to the 
temp dir.


I hope you have any idea how to fix this.

Thanks in advance.

Christian


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Indexing about 40 Billion Entries

2012-06-20 Thread Ananda Kumar
looks like the value that you give for myisam_max_sort_size is not enough
for the index creation and hence it doing a REPAIR WITH KEYCACHE

Use the below query to set the min values required for myisam_max_sort_size
to avoid repair with keycache


select
a.index_name as index_name,
ceil(
(select count(*) from `db_name`.`table_name`) *(
@@global.myisam_data_pointer_size +
sum(ifnull(1 + `a`.`sub_part` *
(`b`.`character_octet_length`/`b`.`character_maximum_length`),
ifnull(1 + `b`.`character_octet_length`,
case
when `b`.`data_type` = 'tinyint' then 1
when `b`.`data_type` = 'smallint' then 2
when `b`.`data_type` = 'mediumint' then 3
when `b`.`data_type` = 'int' then 4
when `b`.`data_type` = 'bigint' then 8
when `b`.`data_type` = 'float' then 4
when `b`.`data_type` = 'double' then 8
when `b`.`data_type` = 'real' then 8
when `b`.`data_type` = 'bit' then 8
when `b`.`data_type` = 'date' then 3
when `b`.`data_type` = 'datetime' then 8
when `b`.`data_type` = 'timestamp' then 4
when `b`.`data_type` = 'time' then 3
when `b`.`data_type` = 'year' then 1
when `b`.`data_type` = 'enum' then 2
when `b`.`data_type` = 'set' then 8
when `b`.`data_type` = 'decimal' then 8

end
)

)
+ if(`a`.`nullable`='YES',1,0)
+ if(`b`.`character_octet_length` =255,2,0)
))/1048576)*1048576  as `index_size`
from
`information_schema`.`statistics` `a`,
`information_schema`.`columns` `b`
where
`a`.`table_name`=`b`.`table_name` and
`a`.`table_schema`=`b`.`table_schema` and
`a`.`column_name`=`b`.`column_name` and
`a`.`table_schema`='db_name' and
`a`.`table_name`='table_name'

group by `a`.`index_name`
order by `index_size` desc limit 1;

On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch 
ckoetteritz...@e-humanities.net wrote:

 Hi guys,

 As the title says I'm trying to index 40 billion entries with two indexes
 on a server with 16 cores and 128GB RAM. The table is the one below and it
 is a myisam table. The *.myd file is about 640GB

 DROP TABLE IF EXISTS `l4_link`;
 CREATE TABLE  `l4_link` (
  `ruid1` int NOT NULL,
  `ruid2` int NOT NULL,
  `overlap` int NOT NULL
 );

 I need an index for ruid1 and for ruid2.

 The status for this table is the following:

   Name: l4_link
 Engine: MyISAM
Version: 10
 Row_format: Fixed
   Rows: 39806500262
  Avg_row_length: 17
Data_length: 676710504454
 Max_data_length: 4785074604081151
   Index_length: 1024
  Data_free: 0
  Auto_increment: NULL
Create_time: 2012-06-19 14:51:29
Update_time: 2012-06-19 16:26:35
 Check_time: NULL
  Collation: utf8_general_ci
   Checksum: NULL
  Create_options:
Comment:

 The variables for myisam are the following:
 mysql show global variables like '%myisam%';
 +-**---+--**
 -+
 | Variable_name   | Value
  |
 +-**---+--**
 -+
 | myisam_data_pointer_size  | 6 |
 | myisam_max_sort_file_size | 9223372036853727232  |
 | myisam_mmap_size | 18446744073709551615|
 | myisam_recover_options | BACKUP   |
 | myisam_repair_threads   | 1 |
 | myisam_sort_buffer_size| 8388608  |
 | myisam_stats_method| nulls_unequal |
 | myisam_use_mmap | OFF
|
 +-**--+---**
 -+
 8 rows in set (0.00 sec)

 The temp folder has about 16tb free space.

 When I start the indexing process, it copies the 640Gb into a temp file
 and then starts with repair with keycache.
 On the internet I found that if it says repair with keycache you shold
 increase the myisam_max_sort_file_size, but this didn't work.
 It still says repair with keycache after it copied the data to the temp
 dir.

 I hope you have any idea how to fix this.

 Thanks in advance.

 Christian


 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql




RE: Need Query Help

2012-06-20 Thread Rick James
http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_timediff
and SEC_TO_TIME()/3600

 -Original Message-
 From: Anupam Karmarkar [mailto:sb_akarmar...@yahoo.com]
 Sent: Wednesday, June 20, 2012 2:39 AM
 To: mysql@lists.mysql.com
 Subject: Need Query Help
 
 Hi All,
 
 I need query help for following table struture, where we need to
 calculate login duration of that employee for give period.
 
 Example table
 
 
 EmployeeID     LoginTime   LogoutTIme
 
 101             2012-05-01 10:00:00     2012-05-01 12:30:00
 102             2012-04-31 23:00:00      2012-05-02 05:00:00
 
 103             2012-05-01 14:00:00  NULL
 104             2012-05-02 00:10:00  2012-05-02 05:00:00
 
 
 I tried to fit all scenario in above table, Consider NULL as yet to
 logout
 
 
 How would i calcuate Employee and it Login duration for period say from
 2012-05-01 08:00:00 to 2012-05-01 22:00:00
 
 
 --Anupam

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: Indexing about 40 Billion Entries

2012-06-20 Thread Rick James
Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still take 
days, maybe weeks.

I strongly recommend you do not try to access that many rows directly.  
Instead, build summary tables, and access them.  We can discuss further.

Unfortunately, you did not even include a PRIMARY KEY when you built the table. 
 This makes any operations slow.

 I need an index for ruid1 and for ruid2.
What will your SELECTs look like?  You may be better off with
INDEX(ruid1, ruid2, overlap), and
INDEX(ruid2, ruid1, overlap)

Will you be adding more rows to this table?  Or is it now static?

What are the semantics of the fields?


 -Original Message-
 From: Ananda Kumar [mailto:anan...@gmail.com]
 Sent: Wednesday, June 20, 2012 4:37 AM
 To: Christian Koetteritzsch
 Cc: mysql@lists.mysql.com
 Subject: Re: Indexing about 40 Billion Entries
 
 looks like the value that you give for myisam_max_sort_size is not
 enough for the index creation and hence it doing a REPAIR WITH
 KEYCACHE
 
 Use the below query to set the min values required for
 myisam_max_sort_size to avoid repair with keycache
 
 
 select
 a.index_name as index_name,
 ceil(
 (select count(*) from `db_name`.`table_name`) *(
 @@global.myisam_data_pointer_size +
 sum(ifnull(1 + `a`.`sub_part` *
 (`b`.`character_octet_length`/`b`.`character_maximum_length`),
 ifnull(1 + `b`.`character_octet_length`,
 case
 when `b`.`data_type` = 'tinyint' then 1
 when `b`.`data_type` = 'smallint' then
 2
 when `b`.`data_type` = 'mediumint' then
 3
 when `b`.`data_type` = 'int' then 4
 when `b`.`data_type` = 'bigint' then 8
 when `b`.`data_type` = 'float' then 4
 when `b`.`data_type` = 'double' then 8
 when `b`.`data_type` = 'real' then 8
 when `b`.`data_type` = 'bit' then 8
 when `b`.`data_type` = 'date' then 3
 when `b`.`data_type` = 'datetime' then
 8
 when `b`.`data_type` = 'timestamp' then
 4
 when `b`.`data_type` = 'time' then 3
 when `b`.`data_type` = 'year' then 1
 when `b`.`data_type` = 'enum' then 2
 when `b`.`data_type` = 'set' then 8
 when `b`.`data_type` = 'decimal' then 8
 
 end
 )
 
 )
 + if(`a`.`nullable`='YES',1,0)
 + if(`b`.`character_octet_length` =255,2,0)
 ))/1048576)*1048576  as `index_size` from
 `information_schema`.`statistics` `a`, `information_schema`.`columns`
 `b` where
 `a`.`table_name`=`b`.`table_name` and
 `a`.`table_schema`=`b`.`table_schema` and
 `a`.`column_name`=`b`.`column_name` and
 `a`.`table_schema`='db_name' and
 `a`.`table_name`='table_name'
 
 group by `a`.`index_name`
 order by `index_size` desc limit 1;
 
 On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch 
 ckoetteritz...@e-humanities.net wrote:
 
  Hi guys,
 
  As the title says I'm trying to index 40 billion entries with two
  indexes on a server with 16 cores and 128GB RAM. The table is the one
  below and it is a myisam table. The *.myd file is about 640GB
 
  DROP TABLE IF EXISTS `l4_link`;
  CREATE TABLE  `l4_link` (
   `ruid1` int NOT NULL,
   `ruid2` int NOT NULL,
   `overlap` int NOT NULL
  );
 
  I need an index for ruid1 and for ruid2.
 
  The status for this table is the following:
 
Name: l4_link
  Engine: MyISAM
 Version: 10
  Row_format: Fixed
Rows: 39806500262
   Avg_row_length: 17
 Data_length: 676710504454
  Max_data_length: 4785074604081151
Index_length: 1024
   Data_free: 0
   Auto_increment: NULL
 Create_time: 2012-06-19 14:51:29
 Update_time: 2012-06-19 16:26:35
  Check_time: NULL
   Collation: utf8_general_ci
Checksum: NULL
   Create_options:
 Comment:
 
  The variables for myisam are the following:
  mysql show global variables like '%myisam%';
  +-**---+--**
  -+
  | Variable_name   | Value
   |
  +-**---+--**
  -+
  | myisam_data_pointer_size  | 6 |
  | myisam_max_sort_file_size | 9223372036853727232  |
  | myisam_mmap_size | 18446744073709551615|
  | myisam_recover_options | BACKUP   |
  | myisam_repair_threads   | 1 |
  | myisam_sort_buffer_size| 8388608
 |
  | myisam_stats_method| 

Re: Indexing about 40 Billion Entries

2012-06-20 Thread Christian Koetteritzsch
Thanks for the information. It is no problem if it takes days or weeks, 
because the server is specially for such tasks that takes time and uses 
lots of resources.


Am 20.06.2012 19:55, schrieb Rick James:

Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still take 
days, maybe weeks.

I strongly recommend you do not try to access that many rows directly.  
Instead, build summary tables, and access them.  We can discuss further.
Did you mean that I make several partitions with for example 1 Billion 
Entries and than make a union of the results from the partitions?


Unfortunately, you did not even include a PRIMARY KEY when you built the table. 
 This makes any operations slow.

There is no PRIMARY KEY because ruid1 and ruid2 are not unique values.



I need an index for ruid1 and for ruid2.

What will your SELECTs look like?  You may be better off with
INDEX(ruid1, ruid2, overlap), and
INDEX(ruid2, ruid1, overlap)

Will you be adding more rows to this table?  Or is it now static?
No data will be added to this table so it is a static table.  And the 
index, as i have planed them will be

INDEX(ruid1), and INDEX(ruid2).

My SELECT looks like this:

SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx

and xxx are the same int values


What are the semantics of the fields?
The semantic behind these fields are, that ruid1 and ruid2 are ID's for 
sentences that have common words and overlap is the number of words they 
have in common.


I hope the informations I gave you are helpful. If you have more 
questions than you can ask them.


Kind regards

Christian




-Original Message-
From: Ananda Kumar [mailto:anan...@gmail.com]
Sent: Wednesday, June 20, 2012 4:37 AM
To: Christian Koetteritzsch
Cc:mysql@lists.mysql.com
Subject: Re: Indexing about 40 Billion Entries

looks like the value that you give for myisam_max_sort_size is not
enough for the index creation and hence it doing a REPAIR WITH
KEYCACHE

Use the below query to set the min values required for
myisam_max_sort_size to avoid repair with keycache


select
 a.index_name as index_name,
 ceil(
 (select count(*) from `db_name`.`table_name`) *(
 @@global.myisam_data_pointer_size +
 sum(ifnull(1 + `a`.`sub_part` *
(`b`.`character_octet_length`/`b`.`character_maximum_length`),
 ifnull(1 + `b`.`character_octet_length`,
 case
 when `b`.`data_type` = 'tinyint' then 1
 when `b`.`data_type` = 'smallint' then
2
 when `b`.`data_type` = 'mediumint' then
3
 when `b`.`data_type` = 'int' then 4
 when `b`.`data_type` = 'bigint' then 8
 when `b`.`data_type` = 'float' then 4
 when `b`.`data_type` = 'double' then 8
 when `b`.`data_type` = 'real' then 8
 when `b`.`data_type` = 'bit' then 8
 when `b`.`data_type` = 'date' then 3
 when `b`.`data_type` = 'datetime' then
8
 when `b`.`data_type` = 'timestamp' then
4
 when `b`.`data_type` = 'time' then 3
 when `b`.`data_type` = 'year' then 1
 when `b`.`data_type` = 'enum' then 2
 when `b`.`data_type` = 'set' then 8
 when `b`.`data_type` = 'decimal' then 8

 end
 )

 )
 + if(`a`.`nullable`='YES',1,0)
 + if(`b`.`character_octet_length` =255,2,0)
 ))/1048576)*1048576  as `index_size` from
`information_schema`.`statistics` `a`, `information_schema`.`columns`
`b` where
 `a`.`table_name`=`b`.`table_name` and
 `a`.`table_schema`=`b`.`table_schema` and
 `a`.`column_name`=`b`.`column_name` and
 `a`.`table_schema`='db_name' and
 `a`.`table_name`='table_name'

group by `a`.`index_name`
order by `index_size` desc limit 1;

On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch 
ckoetteritz...@e-humanities.net wrote:


Hi guys,

As the title says I'm trying to index 40 billion entries with two
indexes on a server with 16 cores and 128GB RAM. The table is the one
below and it is a myisam table. The *.myd file is about 640GB

DROP TABLE IF EXISTS `l4_link`;
CREATE TABLE  `l4_link` (
  `ruid1` int NOT NULL,
  `ruid2` int NOT NULL,
  `overlap` int NOT NULL
);

I need an index for ruid1 and for ruid2.

The status for this table is the following:

   Name: l4_link
 Engine: MyISAM
Version: 10
 Row_format: Fixed
   Rows: 39806500262
  Avg_row_length: 17
Data_length: 676710504454
Max_data_length: 4785074604081151
   Index_length: 1024
 

Re: Indexing about 40 Billion Entries

2012-06-20 Thread Christian Koetteritzsch
sorry I forget the ORDER BY in the SELECT statement so the correct 
SELECT statement is:


SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx 
ORDER BY overlap DESC


Am 20.06.2012 22:42, schrieb Christian Koetteritzsch:
Thanks for the information. It is no problem if it takes days or 
weeks, because the server is specially for such tasks that takes time 
and uses lots of resources.


Am 20.06.2012 19:55, schrieb Rick James:
Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still 
take days, maybe weeks.


I strongly recommend you do not try to access that many rows 
directly.  Instead, build summary tables, and access them.  We can 
discuss further.
Did you mean that I make several partitions with for example 1 Billion 
Entries and than make a union of the results from the partitions?


Unfortunately, you did not even include a PRIMARY KEY when you built 
the table.  This makes any operations slow.

There is no PRIMARY KEY because ruid1 and ruid2 are not unique values.



I need an index for ruid1 and for ruid2.

What will your SELECTs look like?  You may be better off with
INDEX(ruid1, ruid2, overlap), and
INDEX(ruid2, ruid1, overlap)

Will you be adding more rows to this table?  Or is it now static?
No data will be added to this table so it is a static table.  And the 
index, as i have planed them will be

INDEX(ruid1), and INDEX(ruid2).

My SELECT looks like this:

SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx

and xxx are the same int values


What are the semantics of the fields?
The semantic behind these fields are, that ruid1 and ruid2 are ID's 
for sentences that have common words and overlap is the number of 
words they have in common.


I hope the informations I gave you are helpful. If you have more 
questions than you can ask them.


Kind regards

Christian




-Original Message-
From: Ananda Kumar [mailto:anan...@gmail.com]
Sent: Wednesday, June 20, 2012 4:37 AM
To: Christian Koetteritzsch
Cc:mysql@lists.mysql.com
Subject: Re: Indexing about 40 Billion Entries

looks like the value that you give for myisam_max_sort_size is not
enough for the index creation and hence it doing a REPAIR WITH
KEYCACHE

Use the below query to set the min values required for
myisam_max_sort_size to avoid repair with keycache


select
 a.index_name as index_name,
 ceil(
 (select count(*) from `db_name`.`table_name`) *(
 @@global.myisam_data_pointer_size +
 sum(ifnull(1 + `a`.`sub_part` *
(`b`.`character_octet_length`/`b`.`character_maximum_length`),
 ifnull(1 + `b`.`character_octet_length`,
 case
 when `b`.`data_type` = 'tinyint' 
then 1

 when `b`.`data_type` = 'smallint' then
2
 when `b`.`data_type` = 'mediumint' 
then

3
 when `b`.`data_type` = 'int' then 4
 when `b`.`data_type` = 'bigint' then 8
 when `b`.`data_type` = 'float' then 4
 when `b`.`data_type` = 'double' then 8
 when `b`.`data_type` = 'real' then 8
 when `b`.`data_type` = 'bit' then 8
 when `b`.`data_type` = 'date' then 3
 when `b`.`data_type` = 'datetime' then
8
 when `b`.`data_type` = 'timestamp' 
then

4
 when `b`.`data_type` = 'time' then 3
 when `b`.`data_type` = 'year' then 1
 when `b`.`data_type` = 'enum' then 2
 when `b`.`data_type` = 'set' then 8
 when `b`.`data_type` = 'decimal' 
then 8


 end
 )

 )
 + if(`a`.`nullable`='YES',1,0)
 + if(`b`.`character_octet_length` =255,2,0)
 ))/1048576)*1048576  as `index_size` from
`information_schema`.`statistics` `a`, `information_schema`.`columns`
`b` where
 `a`.`table_name`=`b`.`table_name` and
 `a`.`table_schema`=`b`.`table_schema` and
 `a`.`column_name`=`b`.`column_name` and
 `a`.`table_schema`='db_name' and
 `a`.`table_name`='table_name'

group by `a`.`index_name`
order by `index_size` desc limit 1;

On Wed, Jun 20, 2012 at 3:15 PM, Christian Koetteritzsch 
ckoetteritz...@e-humanities.net wrote:


Hi guys,

As the title says I'm trying to index 40 billion entries with two
indexes on a server with 16 cores and 128GB RAM. The table is the one
below and it is a myisam table. The *.myd file is about 640GB

DROP TABLE IF EXISTS `l4_link`;
CREATE TABLE  `l4_link` (
  `ruid1` int NOT NULL,
  `ruid2` int NOT NULL,
  `overlap` int NOT NULL
);

I need an index for ruid1 and for ruid2.

The status for this table 

Re: Indexing about 40 Billion Entries

2012-06-20 Thread Shawn Green

On 6/20/2012 5:45 AM, Christian Koetteritzsch wrote:

Hi guys,

As the title says I'm trying to index 40 billion entries with two
indexes on a server with 16 cores and 128GB RAM. The table is the one
below and it is a myisam table. The *.myd file is about 640GB

DROP TABLE IF EXISTS `l4_link`;
CREATE TABLE  `l4_link` (
   `ruid1` int NOT NULL,
   `ruid2` int NOT NULL,
   `overlap` int NOT NULL
);

I need an index for ruid1 and for ruid2.



Actually, based on your proposed query, I believe you want an index on 
(ruid1, ruid2) not separate indexes for each column.




The status for this table is the following:

Name: l4_link
  Engine: MyISAM
 Version: 10
  Row_format: Fixed
Rows: 39806500262
  Avg_row_length: 17
 Data_length: 676710504454
Max_data_length: 4785074604081151
Index_length: 1024
   Data_free: 0
  Auto_increment: NULL
 Create_time: 2012-06-19 14:51:29
 Update_time: 2012-06-19 16:26:35
  Check_time: NULL
   Collation: utf8_general_ci
Checksum: NULL
  Create_options:
 Comment:

The variables for myisam are the following:
mysql show global variables like '%myisam%';
++---+

| Variable_name   |
Value  |
++---+

| myisam_data_pointer_size  | 6 |
| myisam_max_sort_file_size | 9223372036853727232  |
| myisam_mmap_size | 18446744073709551615|
| myisam_recover_options | BACKUP   |
| myisam_repair_threads   | 1 |
| myisam_sort_buffer_size| 8388608  |
| myisam_stats_method| nulls_unequal |
| myisam_use_mmap |
OFF  |
+---++

8 rows in set (0.00 sec)

The temp folder has about 16tb free space.

When I start the indexing process, it copies the 640Gb into a temp file
and then starts with repair with keycache.
On the internet I found that if it says repair with keycache you shold
increase the myisam_max_sort_file_size, but this didn't work.
It still says repair with keycache after it copied the data to the
temp dir.

I hope you have any idea how to fix this.



Try maxing out the following settings:

  myisam_sort_buffer_size
  key_buffer_size

http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_myisam_sort_buffer_size

http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_myisam_sort_buffer_size

You will need to experiment with how large the maximum value will be 
permitted for your platform.


--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: Indexing about 40 Billion Entries

2012-06-20 Thread Rick James
(ruid1, ruid2) will help for AND, but not at all for OR.

 -Original Message-
 From: Shawn Green [mailto:shawn.l.gr...@oracle.com]
 Sent: Wednesday, June 20, 2012 2:30 PM
 To: mysql@lists.mysql.com
 Subject: Re: Indexing about 40 Billion Entries
 
 On 6/20/2012 5:45 AM, Christian Koetteritzsch wrote:
  Hi guys,
 
  As the title says I'm trying to index 40 billion entries with two
  indexes on a server with 16 cores and 128GB RAM. The table is the one
  below and it is a myisam table. The *.myd file is about 640GB
 
  DROP TABLE IF EXISTS `l4_link`;
  CREATE TABLE  `l4_link` (
 `ruid1` int NOT NULL,
 `ruid2` int NOT NULL,
 `overlap` int NOT NULL
  );
 
  I need an index for ruid1 and for ruid2.
 
 
 Actually, based on your proposed query, I believe you want an index on
 (ruid1, ruid2) not separate indexes for each column.
 
 
  The status for this table is the following:
 
  Name: l4_link
Engine: MyISAM
   Version: 10
Row_format: Fixed
  Rows: 39806500262
Avg_row_length: 17
   Data_length: 676710504454
  Max_data_length: 4785074604081151
  Index_length: 1024
 Data_free: 0
Auto_increment: NULL
   Create_time: 2012-06-19 14:51:29
   Update_time: 2012-06-19 16:26:35
Check_time: NULL
 Collation: utf8_general_ci
  Checksum: NULL
Create_options:
   Comment:
 
  The variables for myisam are the following:
  mysql show global variables like '%myisam%';
  ++---
 +
 
  | Variable_name   |
  Value  |
  ++---
 +
 
  | myisam_data_pointer_size  | 6 |
  | myisam_max_sort_file_size | 9223372036853727232  |
  | myisam_mmap_size | 18446744073709551615|
  | myisam_recover_options | BACKUP   |
  | myisam_repair_threads   | 1 |
  | myisam_sort_buffer_size| 8388608
 |
  | myisam_stats_method| nulls_unequal |
  | myisam_use_mmap |
  OFF  |
  +---+
 +
 
  8 rows in set (0.00 sec)
 
  The temp folder has about 16tb free space.
 
  When I start the indexing process, it copies the 640Gb into a temp
 file
  and then starts with repair with keycache.
  On the internet I found that if it says repair with keycache you
 shold
  increase the myisam_max_sort_file_size, but this didn't work.
  It still says repair with keycache after it copied the data to the
  temp dir.
 
  I hope you have any idea how to fix this.
 
 
 Try maxing out the following settings:
 
myisam_sort_buffer_size
key_buffer_size
 
 http://dev.mysql.com/doc/refman/5.5/en/server-system-
 variables.html#sysvar_myisam_sort_buffer_size
 
 http://dev.mysql.com/doc/refman/5.5/en/server-system-
 variables.html#sysvar_myisam_sort_buffer_size
 
 You will need to experiment with how large the maximum value will be
 permitted for your platform.
 
 --
 Shawn Green
 MySQL Principal Technical Support Engineer
 Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
 Office: Blountville, TN
 
 
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: Indexing about 40 Billion Entries

2012-06-20 Thread Rick James
 SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx ORDER 
 BY overlap DESC

MySQL does not optimize that kind of OR well.  This will run _much_ faster 
(with your two indexes):

(
   SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx
   UNION
   SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx
) ORDER BY overlap DESC;

Make it UNION ALL or UNION DISTINCT depending on whether xxx can be in both 
fields of one row.  UNION DISTINCT makes a pass over the temp table to dedup.

Your version (OR) will do a table scan.

My version will:
1. scan ruid1 index (.MYI) (1+ disk hits for index range scan)
2. for each matching ruid1, fetch the data row (in .MYD) (1 hit/row); write to 
tmp table
3,4.  ditto for ruid2, appending to same tmp table
5. sort tmp table (ORDER BY) (probably 0 disk hits, due to using MEMORY)

Once the indexes are in place...  Depending on the version you are using, 
key_buffer_size is limited to 4G or is not.  For your huge machine, 24G might 
be wise.  The key_buffer will help steps 1,3.  The rest of RAM will be 
available for the OS to cache the data blocks (2,4).

The 16 cores -- A single query (even with the UNION I suggested) will use only 
one core.

How many rows (average, max) do you expect from
   SELECT ... FROM l4_link WHERE ruid1=xxx
I ask, because that might run as slow as 100 rows/sec., simply because of the 
disk hits.

PARTITIONing, per se, does not help performance.  There are only a few use 
cases where PARTITION shines.  (I have not seen such [yet] in your application.)

 -Original Message-
 From: Christian Koetteritzsch [mailto:ckoetteritz...@e-humanities.net]
 Sent: Wednesday, June 20, 2012 1:42 PM
 To: Rick James
 Cc: Ananda Kumar; mysql@lists.mysql.com
 Subject: Re: Indexing about 40 Billion Entries
 
 Thanks for the information. It is no problem if it takes days or weeks,
 because the server is specially for such tasks that takes time and uses
 lots of resources.
 
 Am 20.06.2012 19:55, schrieb Rick James:
  Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still
 take days, maybe weeks.
 
  I strongly recommend you do not try to access that many rows
 directly.  Instead, build summary tables, and access them.  We can
 discuss further.
 Did you mean that I make several partitions with for example 1 Billion
 Entries and than make a union of the results from the partitions?
 
  Unfortunately, you did not even include a PRIMARY KEY when you built
 the table.  This makes any operations slow.
 There is no PRIMARY KEY because ruid1 and ruid2 are not unique values.
 
  I need an index for ruid1 and for ruid2.
  What will your SELECTs look like?  You may be better off with
  INDEX(ruid1, ruid2, overlap), and INDEX(ruid2, ruid1, overlap)
 
  Will you be adding more rows to this table?  Or is it now static?
 No data will be added to this table so it is a static table.  And the
 index, as i have planed them will be INDEX(ruid1), and INDEX(ruid2).
 
 My SELECT looks like this:
 
 SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx
 
 and xxx are the same int values
 
  What are the semantics of the fields?
 The semantic behind these fields are, that ruid1 and ruid2 are ID's for
 sentences that have common words and overlap is the number of words
 they have in common.
 
 I hope the informations I gave you are helpful. If you have more
 questions than you can ask them.
 
 Kind regards
 
 Christian
 
 
  -Original Message-
  From: Ananda Kumar [mailto:anan...@gmail.com]
  Sent: Wednesday, June 20, 2012 4:37 AM
  To: Christian Koetteritzsch
  Cc:mysql@lists.mysql.com
  Subject: Re: Indexing about 40 Billion Entries
 
  looks like the value that you give for myisam_max_sort_size is not
  enough for the index creation and hence it doing a REPAIR WITH
  KEYCACHE
 
  Use the below query to set the min values required for
  myisam_max_sort_size to avoid repair with keycache
 
 
  select
   a.index_name as index_name,
   ceil(
   (select count(*) from `db_name`.`table_name`) *(
   @@global.myisam_data_pointer_size +
   sum(ifnull(1 + `a`.`sub_part` *
  (`b`.`character_octet_length`/`b`.`character_maximum_length`),
   ifnull(1 + `b`.`character_octet_length`,
   case
   when `b`.`data_type` = 'tinyint'
 then 1
   when `b`.`data_type` = 'smallint'
  then
  2
   when `b`.`data_type` = 'mediumint'
  then
  3
   when `b`.`data_type` = 'int' then 4
   when `b`.`data_type` = 'bigint'
 then 8
   when `b`.`data_type` = 'float' then
 4
   when `b`.`data_type` = 'double'
 then 8
   when `b`.`data_type` = 'real' then
 8
   when `b`.`data_type` = 'bit' then 8