Re: Indexing about 40 Billion Entries
Thank you a lot. The first indexing process finished after about 13 hours, so I think the problem is solved now. I set the myisam_sort_bufffer_size to 10GB. For the query I will adjust it to your version. Am 20.06.2012 23:32, schrieb Rick James: SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx ORDER BY overlap DESC MySQL does not optimize that kind of OR well. This will run _much_ faster (with your two indexes): ( SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx UNION SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx ) ORDER BY overlap DESC; Make it UNION ALL or UNION DISTINCT depending on whether xxx can be in both fields of one row. UNION DISTINCT makes a pass over the temp table to dedup. Your version (OR) will do a table scan. My version will: 1. scan ruid1 index (.MYI) (1+ disk hits for index range scan) 2. for each matching ruid1, fetch the data row (in .MYD) (1 hit/row); write to tmp table 3,4. ditto for ruid2, appending to same tmp table 5. sort tmp table (ORDER BY) (probably 0 disk hits, due to using MEMORY) Once the indexes are in place... Depending on the version you are using, key_buffer_size is limited to 4G or is not. For your huge machine, 24G might be wise. The key_buffer will help steps 1,3. The rest of RAM will be available for the OS to cache the data blocks (2,4). The 16 cores -- A single query (even with the UNION I suggested) will use only one core. How many rows (average, max) do you expect from SELECT ... FROM l4_link WHERE ruid1=xxx I ask, because that might run as slow as 100 rows/sec., simply because of the disk hits. I'm expecting an average of 5000 rows. PARTITIONing, per se, does not help performance. There are only a few use cases where PARTITION shines. (I have not seen such [yet] in your application.) -Original Message- From: Christian Koetteritzsch [mailto:ckoetteritz...@e-humanities.net] Sent: Wednesday, June 20, 2012 1:42 PM To: Rick James Cc: Ananda Kumar; mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries Thanks for the information. It is no problem if it takes days or weeks, because the server is specially for such tasks that takes time and uses lots of resources. Am 20.06.2012 19:55, schrieb Rick James: Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still take days, maybe weeks. I strongly recommend you do not try to access that many rows directly. Instead, build summary tables, and access them. We can discuss further. Did you mean that I make several partitions with for example 1 Billion Entries and than make a union of the results from the partitions? Unfortunately, you did not even include a PRIMARY KEY when you built the table. This makes any operations slow. There is no PRIMARY KEY because ruid1 and ruid2 are not unique values. I need an index for ruid1 and for ruid2. What will your SELECTs look like? You may be better off with INDEX(ruid1, ruid2, overlap), and INDEX(ruid2, ruid1, overlap) Will you be adding more rows to this table? Or is it now static? No data will be added to this table so it is a static table. And the index, as i have planed them will be INDEX(ruid1), and INDEX(ruid2). My SELECT looks like this: SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx and xxx are the same int values What are the semantics of the fields? The semantic behind these fields are, that ruid1 and ruid2 are ID's for sentences that have common words and overlap is the number of words they have in common. I hope the informations I gave you are helpful. If you have more questions than you can ask them. Kind regards Christian -Original Message- From: Ananda Kumar [mailto:anan...@gmail.com] Sent: Wednesday, June 20, 2012 4:37 AM To: Christian Koetteritzsch Cc:mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries looks like the value that you give for myisam_max_sort_size is not enough for the index creation and hence it doing a REPAIR WITH KEYCACHE Use the below query to set the min values required for myisam_max_sort_size to avoid repair with keycache select a.index_name as index_name, ceil( (select count(*) from `db_name`.`table_name`) *( @@global.myisam_data_pointer_size + sum(ifnull(1 + `a`.`sub_part` * (`b`.`character_octet_length`/`b`.`character_maximum_length`), ifnull(1 + `b`.`character_octet_length`, case when `b`.`data_type` = 'tinyint' then 1 when `b`.`data_type` = 'smallint' then 2 when `b`.`data_type` = 'mediumint' then 3 when `b`.`data_type` = 'int' then 4 when `b`.`data_type` = 'bigint' then 8 when `b`.`data_type` = 'float' then 4
Re: Indexing about 40 Billion Entries
On 20/06/2012 11:45, Christian Koetteritzsch wrote: Hi guys, As the title says I'm trying to index 40 billion entries with two indexes on a server with 16 cores and 128GB RAM. The table is the one below and it is a myisam table. The *.myd file is about 640GB Hiya I am unable to help. But one question that was running through my mind, as I was reading this is: How do you do your backups? I use mylvmbackup on a slave master replication server, but I would love to know how or what you use and do your backups. Hope you come right with your problem. Thanks Brent -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Indexing about 40 Billion Entries
At 02:04 AM 6/21/2012, you wrote: Thank you a lot. The first indexing process finished after about 13 hours, so I think the problem is solved now. I set the myisam_sort_bufffer_size to 10GB. The first indexing process??? You should have created all of your indexes with one Alter statement. Otherwise it will take another 13+ hours to build the second index because it has to create a copy of the table all over again. Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: ANN: Hopper (stored procedure debugger), version 1.0.1 released
Can you PLEASE note in your listing when a product is Microsloth-only? While you're at it, can you PLEASE note it prominently on your website? I looked through your product description and saw no specific requirements beyond what databases were supported. It wasn't until I tried to download it that I noticed the warning sign. (.EXE in the file name) Hard as it is to believe, the entire world does not worship at the alter of Bill Gates. We don't allow any Microsloth products on our site. In a low-energy future... the wealth of nations will be measured by the quantity and quality of their forests. -- David Holmgren Jan Steinman, EcoReality Co-op -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Indexing about 40 Billion Entries
Hi, thanks for the hint, I was testing the settings for the myisam_sort_buffer_size so I totally forgot it. But I have another three of these databases to do, so next time I do it with one ALTER statement Christian Am 21.06.2012 16:50, schrieb mos: At 02:04 AM 6/21/2012, you wrote: Thank you a lot. The first indexing process finished after about 13 hours, so I think the problem is solved now. I set the myisam_sort_bufffer_size to 10GB. The first indexing process??? You should have created all of your indexes with one Alter statement. Otherwise it will take another 13+ hours to build the second index because it has to create a copy of the table all over again. Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
New guy
I don't know if new members usually introduce themselves here... ...but if not, why not? I was very active on the SQL Server news group a few years back but this MYSQL stuff is new to me. Seems I'll be muddling around for at least a little while so I might have a few questions in the near future. Thanks in advance, Raymond from Québec City Raymond D'Anjou Programmeur-analyste WEB Artmérik International 600, rue Saint-Jean Québec (Québec) G1R 1P8 www.artmerik.comhttp://www.artmerik.com Tél.: 418-529-6800 Téléc.: 418-529-1982
Re: New guy
Welcome Raymond, hope you enjoy your stay :) On Thu, Jun 21, 2012 at 8:47 PM, Raymond D'Anjou radan...@osullivan-quebec.qc.ca wrote: I don't know if new members usually introduce themselves here... ...but if not, why not? I was very active on the SQL Server news group a few years back but this MYSQL stuff is new to me. Seems I'll be muddling around for at least a little while so I might have a few questions in the near future. Thanks in advance, Raymond from Québec City Raymond D'Anjou Programmeur-analyste WEB Artmérik International 600, rue Saint-Jean Québec (Québec) G1R 1P8 www.artmerik.comhttp://www.artmerik.com Tél.: 418-529-6800 Téléc.: 418-529-1982
RE: Indexing about 40 Billion Entries
Possibly worse than that, since it will rebuild the 'first' index again. -Original Message- From: mos [mailto:mo...@fastmail.fm] Sent: Thursday, June 21, 2012 7:51 AM To: mysql@lists.mysql.com Subject: Re: Indexing about 40 Billion Entries At 02:04 AM 6/21/2012, you wrote: Thank you a lot. The first indexing process finished after about 13 hours, so I think the problem is solved now. I set the myisam_sort_bufffer_size to 10GB. The first indexing process??? You should have created all of your indexes with one Alter statement. Otherwise it will take another 13+ hours to build the second index because it has to create a copy of the table all over again. Mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Triggers and column names
Hello there I'm trying to figure out an elegant way with Mysql 5.5.25 to log changes via a before change trigger to a table including the column name of the field that changed... How can I dynamically enumerate the field names and populate the field log into the t1_log test table ... Would a cursor be the most efficient way ? CREATE TABLE `t1` ( `a` varchar(12) DEFAULT NULL, `b` varchar(12) DEFAULT NULL, `c` varchar(12) DEFAULT NULL, `hostid` int(12) NOT NULL AUTO_INCREMENT, `date` datetime DEFAULT NULL, UNIQUE KEY `hostid_UNIQUE` (`hostid`) ) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=latin1 CREATE TABLE `t1_log` ( `hostid` int(12) DEFAULT NULL, `field` varchar(12) DEFAULT NULL, `old_value` varchar(12) DEFAULT NULL, `new_value` varchar(12) DEFAULT NULL, `datechanged` datetime DEFAULT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1 The result would be like: mysql select * from t1_log; ++---+---+---+-+ | hostid | field | old_value | new_value | datechanged | ++---+---+---+-+ | 1 | a | 1 | 4 | 2012-06-21 16:30:25 | | 2 | a | 2 | 4 | 2012-06-21 16:35:40 | | 1 | a | 4 | 43| 2012-06-21 16:35:59 | ++---+---+---+-+ 3 rows in set (0.00 sec) I'm getting that done today thru a large static trigger script and I would like something more dynamic... Regards Gael -- Gaël Martinez
CREATE VIEW COMMENT ''
To endow CREATE VIEW with COMMENT would be splendid. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
UNION and ORDER BY
2012/06/20 14:32 -0700, Rick James ( SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx UNION SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx ) ORDER BY overlap DESC; Make it UNION ALL or UNION DISTINCT depending on whether xxx can be in both fields of one row. UNION DISTINCT makes a pass over the temp table to dedup. The help example shows a UNION s each SELECT separatly round-bracketed, (SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx) UNION (SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx) ORDER BY overlap DESC but bracketing both is not mentioned. What is the difference? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Triggers and column names
2012/06/21 17:06 -0500, Gael Martinez I'm getting that done today thru a large static trigger script and I would like something more dynamic... For that it is needful to look up the table in INFORMATION_SCHEMA.COLUMNS, and, yes, you can look up the field names--but then what will you do with the character strings that are the field names? use PREPARE and EXECUTE for the comparisons, being ware of NULL? In MySQL help it is written that for its own purposes MySQL actually tracks all the information that you crave, but it nowhere is written that a BEFORE-UPDATE trigger can make use of it. Maybe UDF, but I know naught about that. Since BEFORE is called on every attempt, successful or not, maybe AFTER would be better. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
RE: UNION and ORDER BY
The parens are for making sure the parsing works correctly. Probably either one works fine. Suggest you do EXPLAIN EXTENDED ...; SHOW WARNINGS; I suspect that the output from each will be identical, and have more parens. The main need for parens is to avoid associating the ORDER BY with just the second SELECT. -Original Message- From: Hal?sz S?ndor [mailto:h...@tbbs.net] Sent: Thursday, June 21, 2012 4:07 PM To: mysql@lists.mysql.com Subject: UNION and ORDER BY 2012/06/20 14:32 -0700, Rick James ( SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx UNION SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx ) ORDER BY overlap DESC; Make it UNION ALL or UNION DISTINCT depending on whether xxx can be in both fields of one row. UNION DISTINCT makes a pass over the temp table to dedup. The help example shows a UNION s each SELECT separatly round-bracketed, (SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx) UNION (SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx) ORDER BY overlap DESC but bracketing both is not mentioned. What is the difference? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Need Query Help
Thanks Rick for your reply, Here i am asking about logic to perpare query or whole query itself. --Anupam From: Rick James rja...@yahoo-inc.com To: Anupam Karmarkar sb_akarmar...@yahoo.com; mysql@lists.mysql.com mysql@lists.mysql.com Sent: Wednesday, 20 June 2012 10:52 PM Subject: RE: Need Query Help http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_timediff and SEC_TO_TIME()/3600 -Original Message- From: Anupam Karmarkar [mailto:sb_akarmar...@yahoo.com] Sent: Wednesday, June 20, 2012 2:39 AM To: mysql@lists.mysql.com Subject: Need Query Help Hi All, I need query help for following table struture, where we need to calculate login duration of that employee for give period. Example table EmployeeID LoginTime LogoutTIme 101 2012-05-01 10:00:00 2012-05-01 12:30:00 102 2012-04-31 23:00:00 2012-05-02 05:00:00 103 2012-05-01 14:00:00 NULL 104 2012-05-02 00:10:00 2012-05-02 05:00:00 I tried to fit all scenario in above table, Consider NULL as yet to logout How would i calcuate Employee and it Login duration for period say from 2012-05-01 08:00:00 to 2012-05-01 22:00:00 --Anupam -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql