Re: Indexing about 40 Billion Entries

2012-06-21 Thread Christian Koetteritzsch
Thank you a lot. The first indexing process finished after about 13 
hours, so I think the problem is solved now.

I set the myisam_sort_bufffer_size to 10GB.

For the query I will adjust it to your version.

Am 20.06.2012 23:32, schrieb Rick James:

SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx ORDER BY 
overlap DESC

MySQL does not optimize that kind of OR well.  This will run _much_ faster 
(with your two indexes):

(
SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx
UNION
SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx
) ORDER BY overlap DESC;

Make it UNION ALL or UNION DISTINCT depending on whether xxx can be in both 
fields of one row.  UNION DISTINCT makes a pass over the temp table to dedup.

Your version (OR) will do a table scan.

My version will:
1. scan ruid1 index (.MYI) (1+ disk hits for index range scan)
2. for each matching ruid1, fetch the data row (in .MYD) (1 hit/row); write to 
tmp table
3,4.  ditto for ruid2, appending to same tmp table
5. sort tmp table (ORDER BY) (probably 0 disk hits, due to using MEMORY)

Once the indexes are in place...  Depending on the version you are using, 
key_buffer_size is limited to 4G or is not.  For your huge machine, 24G might 
be wise.  The key_buffer will help steps 1,3.  The rest of RAM will be 
available for the OS to cache the data blocks (2,4).

The 16 cores -- A single query (even with the UNION I suggested) will use only 
one core.

How many rows (average, max) do you expect from
SELECT ... FROM l4_link WHERE ruid1=xxx
I ask, because that might run as slow as 100 rows/sec., simply because of the 
disk hits.


I'm expecting  an average of 5000 rows.


PARTITIONing, per se, does not help performance.  There are only a few use 
cases where PARTITION shines.  (I have not seen such [yet] in your application.)


-Original Message-
From: Christian Koetteritzsch [mailto:ckoetteritz...@e-humanities.net]
Sent: Wednesday, June 20, 2012 1:42 PM
To: Rick James
Cc: Ananda Kumar; mysql@lists.mysql.com
Subject: Re: Indexing about 40 Billion Entries

Thanks for the information. It is no problem if it takes days or weeks,
because the server is specially for such tasks that takes time and uses
lots of resources.

Am 20.06.2012 19:55, schrieb Rick James:

Even if you get past the REPAIR WITH KEYCACHE, the ALTER will still

take days, maybe weeks.

I strongly recommend you do not try to access that many rows

directly.  Instead, build summary tables, and access them.  We can
discuss further.
Did you mean that I make several partitions with for example 1 Billion
Entries and than make a union of the results from the partitions?

Unfortunately, you did not even include a PRIMARY KEY when you built

the table.  This makes any operations slow.
There is no PRIMARY KEY because ruid1 and ruid2 are not unique values.

I need an index for ruid1 and for ruid2.

What will your SELECTs look like?  You may be better off with
INDEX(ruid1, ruid2, overlap), and INDEX(ruid2, ruid1, overlap)

Will you be adding more rows to this table?  Or is it now static?

No data will be added to this table so it is a static table.  And the
index, as i have planed them will be INDEX(ruid1), and INDEX(ruid2).

My SELECT looks like this:

SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx OR ruid2=xxx

and xxx are the same int values


What are the semantics of the fields?

The semantic behind these fields are, that ruid1 and ruid2 are ID's for
sentences that have common words and overlap is the number of words
they have in common.

I hope the informations I gave you are helpful. If you have more
questions than you can ask them.

Kind regards

Christian



-Original Message-
From: Ananda Kumar [mailto:anan...@gmail.com]
Sent: Wednesday, June 20, 2012 4:37 AM
To: Christian Koetteritzsch
Cc:mysql@lists.mysql.com
Subject: Re: Indexing about 40 Billion Entries

looks like the value that you give for myisam_max_sort_size is not
enough for the index creation and hence it doing a REPAIR WITH
KEYCACHE

Use the below query to set the min values required for
myisam_max_sort_size to avoid repair with keycache


select
  a.index_name as index_name,
  ceil(
  (select count(*) from `db_name`.`table_name`) *(
  @@global.myisam_data_pointer_size +
  sum(ifnull(1 + `a`.`sub_part` *
(`b`.`character_octet_length`/`b`.`character_maximum_length`),
  ifnull(1 + `b`.`character_octet_length`,
  case
  when `b`.`data_type` = 'tinyint'

then 1

  when `b`.`data_type` = 'smallint'
then
2
  when `b`.`data_type` = 'mediumint'
then
3
  when `b`.`data_type` = 'int' then 4
  when `b`.`data_type` = 'bigint'

then 8

  when `b`.`data_type` = 'float' then

4

   

Re: Indexing about 40 Billion Entries

2012-06-21 Thread Brent Clark


On 20/06/2012 11:45, Christian Koetteritzsch wrote:

Hi guys,

As the title says I'm trying to index 40 billion entries with two indexes on a 
server with 16 cores and 128GB RAM. The table is the one below and it is a 
myisam table. The *.myd file is about 640GB



Hiya

I am unable to help. But one question that was running through my mind, as I 
was reading this is:

How do you do your backups?

I use mylvmbackup on a slave master replication server, but I would love to 
know how or what you use and do your backups.

Hope you come right with your problem.

Thanks
Brent

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Indexing about 40 Billion Entries

2012-06-21 Thread mos

At 02:04 AM 6/21/2012, you wrote:
Thank you a lot. The first indexing process finished after about 13 
hours, so I think the problem is solved now.

I set the myisam_sort_bufffer_size to 10GB.


The first indexing process???

You should have created all of your indexes with one Alter statement. 
Otherwise it will take another 13+ hours to build the second index 
because it has to create a copy of the table all over again.


Mike


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: ANN: Hopper (stored procedure debugger), version 1.0.1 released

2012-06-21 Thread Jan Steinman
Can you PLEASE note in your listing when a product is Microsloth-only?

While you're at it, can you PLEASE note it prominently on your website? I 
looked through your product description and saw no specific requirements beyond 
what databases were supported. It wasn't until I tried to download it that I 
noticed the warning sign. (.EXE in the file name)

Hard as it is to believe, the entire world does not worship at the alter of 
Bill Gates. We don't allow any Microsloth products on our site.


In a low-energy future... the wealth of nations will be measured by the 
quantity and quality of their forests. -- David Holmgren
 Jan Steinman, EcoReality Co-op 


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Indexing about 40 Billion Entries

2012-06-21 Thread Christian Koetteritzsch

Hi,

thanks for the hint, I was testing the settings for the 
myisam_sort_buffer_size so I totally forgot it.
But I have another three of these databases to do, so next time I do it 
with one ALTER statement


Christian

Am 21.06.2012 16:50, schrieb mos:

At 02:04 AM 6/21/2012, you wrote:
Thank you a lot. The first indexing process finished after about 13 
hours, so I think the problem is solved now.

I set the myisam_sort_bufffer_size to 10GB.


The first indexing process???

You should have created all of your indexes with one Alter statement. 
Otherwise it will take another 13+ hours to build the second index 
because it has to create a copy of the table all over again.


Mike






--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



New guy

2012-06-21 Thread Raymond D'Anjou
I don't know if new members usually introduce themselves here...
...but if not, why not?

I was very active on the SQL Server news group a few years back but this MYSQL 
stuff is new to me.
Seems I'll be muddling around for at least a little while so I might have a few 
questions in the near future.

Thanks in advance,
Raymond from Québec City

Raymond D'Anjou
Programmeur-analyste WEB
Artmérik International
600, rue Saint-Jean
Québec (Québec) G1R 1P8
www.artmerik.comhttp://www.artmerik.com
Tél.: 418-529-6800
Téléc.: 418-529-1982



Re: New guy

2012-06-21 Thread Andrew Moore
Welcome Raymond, hope you enjoy your stay :)

On Thu, Jun 21, 2012 at 8:47 PM, Raymond D'Anjou 
radan...@osullivan-quebec.qc.ca wrote:

 I don't know if new members usually introduce themselves here...
 ...but if not, why not?

 I was very active on the SQL Server news group a few years back but this
 MYSQL stuff is new to me.
 Seems I'll be muddling around for at least a little while so I might have
 a few questions in the near future.

 Thanks in advance,
 Raymond from Québec City

 Raymond D'Anjou
 Programmeur-analyste WEB
 Artmérik International
 600, rue Saint-Jean
 Québec (Québec) G1R 1P8
 www.artmerik.comhttp://www.artmerik.com
 Tél.: 418-529-6800
 Téléc.: 418-529-1982




RE: Indexing about 40 Billion Entries

2012-06-21 Thread Rick James
Possibly worse than that, since it will rebuild the 'first' index again.

 -Original Message-
 From: mos [mailto:mo...@fastmail.fm]
 Sent: Thursday, June 21, 2012 7:51 AM
 To: mysql@lists.mysql.com
 Subject: Re: Indexing about 40 Billion Entries
 
 At 02:04 AM 6/21/2012, you wrote:
 Thank you a lot. The first indexing process finished after about 13
 hours, so I think the problem is solved now.
 I set the myisam_sort_bufffer_size to 10GB.
 
 The first indexing process???
 
 You should have created all of your indexes with one Alter statement.
 Otherwise it will take another 13+ hours to build the second index
 because it has to create a copy of the table all over again.
 
 Mike
 
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Triggers and column names

2012-06-21 Thread Gael Martinez
Hello there

I'm trying to figure out an elegant way with Mysql 5.5.25 to log changes
via a before change trigger to a table including the column name of the
field that changed...
How can I dynamically enumerate the field names and populate the field log
into the t1_log test table ... Would a cursor be the most efficient way ?

CREATE TABLE `t1` (
  `a` varchar(12) DEFAULT NULL,
  `b` varchar(12) DEFAULT NULL,
  `c` varchar(12) DEFAULT NULL,
  `hostid` int(12) NOT NULL AUTO_INCREMENT,
  `date` datetime DEFAULT NULL,
  UNIQUE KEY `hostid_UNIQUE` (`hostid`)
) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=latin1

CREATE TABLE `t1_log` (
  `hostid` int(12) DEFAULT NULL,
  `field` varchar(12) DEFAULT NULL,
  `old_value` varchar(12) DEFAULT NULL,
  `new_value` varchar(12) DEFAULT NULL,
  `datechanged` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

The result would be like:

mysql select * from t1_log;
++---+---+---+-+
| hostid | field | old_value | new_value | datechanged |
++---+---+---+-+
|  1 | a | 1 | 4 | 2012-06-21 16:30:25 |
|  2 | a | 2 | 4 | 2012-06-21 16:35:40 |
|  1 | a | 4 | 43| 2012-06-21 16:35:59 |
++---+---+---+-+
3 rows in set (0.00 sec)
I'm getting that done today thru a large static trigger script and I would
like something more dynamic...

Regards

Gael
-- 
Gaël Martinez


CREATE VIEW COMMENT ''

2012-06-21 Thread hsv
To endow CREATE VIEW with COMMENT would be splendid.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



UNION and ORDER BY

2012-06-21 Thread Hal�sz S�ndor
 2012/06/20 14:32 -0700, Rick James 
(
   SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx
   UNION
   SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx
) ORDER BY overlap DESC;

Make it UNION ALL or UNION DISTINCT depending on whether xxx can be in both 
fields of one row.  UNION DISTINCT makes a pass over the temp table to dedup.

The help example shows a UNION s each SELECT separatly round-bracketed,
(SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx)
UNION
(SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx)
  ORDER BY overlap DESC
but bracketing both is not mentioned. What is the difference?


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Triggers and column names

2012-06-21 Thread Hal�sz S�ndor
 2012/06/21 17:06 -0500, Gael Martinez 
I'm getting that done today thru a large static trigger script and I would
like something more dynamic...

For that it is needful to look up the table in INFORMATION_SCHEMA.COLUMNS, and, 
yes, you can look up the field names--but then what will you do with the 
character strings that are the field names? use PREPARE and EXECUTE for the 
comparisons, being ware of NULL?

In MySQL help it is written that for its own purposes MySQL actually tracks all 
the information that you crave, but it nowhere is written that a BEFORE-UPDATE 
trigger can make use of it. Maybe UDF, but I know naught about that.

Since BEFORE is called on every attempt, successful or not, maybe AFTER would 
be better.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: UNION and ORDER BY

2012-06-21 Thread Rick James
The parens are for making sure the parsing works correctly.  Probably either 
one works fine.  Suggest you do
  EXPLAIN EXTENDED ...;
  SHOW WARNINGS;
I suspect that the output from each will be identical, and have more parens.

The main need for parens is to avoid associating the ORDER BY with just the 
second SELECT.

 -Original Message-
 From: Hal?sz S?ndor [mailto:h...@tbbs.net]
 Sent: Thursday, June 21, 2012 4:07 PM
 To: mysql@lists.mysql.com
 Subject: UNION and ORDER BY
 
  2012/06/20 14:32 -0700, Rick James 
 (
SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx
UNION
SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx
 ) ORDER BY overlap DESC;
 
 Make it UNION ALL or UNION DISTINCT depending on whether xxx can be in
 both fields of one row.  UNION DISTINCT makes a pass over the temp
 table to dedup.
 
 The help example shows a UNION s each SELECT separatly round-bracketed,
 (SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid1=xxx)
 UNION
 (SELECT ruid1, ruid2, overlap FROM l4_link WHERE ruid2=xxx)
   ORDER BY overlap DESC
 but bracketing both is not mentioned. What is the difference?
 
 
 --
 MySQL General Mailing List
 For list archives: http://lists.mysql.com/mysql
 To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Need Query Help

2012-06-21 Thread Anupam Karmarkar
Thanks Rick for your reply,

Here i am asking about logic to perpare query or whole query itself.

--Anupam




 From: Rick James rja...@yahoo-inc.com
To: Anupam Karmarkar sb_akarmar...@yahoo.com; mysql@lists.mysql.com 
mysql@lists.mysql.com 
Sent: Wednesday, 20 June 2012 10:52 PM
Subject: RE: Need Query Help
 
http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html#function_timediff
and SEC_TO_TIME()/3600

 -Original Message-
 From: Anupam Karmarkar [mailto:sb_akarmar...@yahoo.com]
 Sent: Wednesday, June 20, 2012 2:39 AM
 To: mysql@lists.mysql.com
 Subject: Need Query Help
 
 Hi All,
 
 I need query help for following table struture, where we need to
 calculate login duration of that employee for give period.
 
 Example table
 
 
 EmployeeID     LoginTime   LogoutTIme
 
 101             2012-05-01 10:00:00     2012-05-01 12:30:00
 102             2012-04-31 23:00:00      2012-05-02 05:00:00
 
 103             2012-05-01 14:00:00  NULL
 104             2012-05-02 00:10:00  2012-05-02 05:00:00
 
 
 I tried to fit all scenario in above table, Consider NULL as yet to
 logout
 
 
 How would i calcuate Employee and it Login duration for period say from
 2012-05-01 08:00:00 to 2012-05-01 22:00:00
 
 
 --Anupam

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql