Re: scalability of MySQL - future plans?

2004-11-12 Thread Udi . S . Karni
Adequate data warehouse performance requires more than just hardware. 2 
crucial make-or-break software features are partitioning and parallel 
query.

On very large tables - accessing a large slice of the data via index is 
completely unfeasible. Table scan is the only option. Partitioning allows 
you to scan only the necessary segments instead of reading the whole table 
and rejecting massive numbers of rows. Parallel query breaks the job up so 
that multiple processes of the OS can participate and speed up the 
process.

These features are an absolute necessity if we wanted to migrate our large 
databases from Oracle to MySQL. We are eager for MySQL to make them 
priority features. MySQL's market appeal would just explode. We will do 
our best to contribute to the effort if we can. I'd like to urge others 
who plan to use MySQL with large databases to consider doing the same.

Thanks,

Udi






"Heikki Tuuri" <[EMAIL PROTECTED]>
11/12/2004 06:57 AM

 
To: <[EMAIL PROTECTED]>
cc: 
Subject:Re: scalability of MySQL - future plans?


Jacek,

- Original Message - 
From: "Jacek Becla" <[EMAIL PROTECTED]>
Newsgroups: mailing.database.myodbc
Sent: Friday, November 12, 2004 2:30 AM
Subject: scalability of MySQL - future plans?


> Hello,
>
> What are the plans regarding improving scalability of MySQL? We are
> currently trying to decide what technology/product to use for a large
> project that will generate ~600TB/year starting in 2012. Any pointers to
> related articles or hints how safe is to assume that MySQL will be able
> to handle petabyte-scale dataset in 8-10 years would be greatly 
> appreciated.

hmm... this mostly depends on hardware. With the innodb_file_per_table 
option, a single InnoDB table can be 64 TB in size, and you can have 4 
billion such tables.

With current PC hardware, the speed of a single CPU allows you to insert 
10 
000 rows per second, if the load is not disk-bound. Let us assume that a 
single row in 100 bytes. That makes 1 MB/s, which is 30 TB/year. CPU speed 

will probably double every 4 years or so. Thus, CPU speed will suffice if 
you use a multiprocessor.

Normally, a database server has main memory at least 1 % of the data size. 

Is 6000 GB RAM realistic in 2012? Memory sizes will probably double every 
2 
to 3 years. If a high-end server today has 32 GB of RAM, in year 2012 it 
might have 512 GB of RAM. You will need a huge server.

The worst problem is the disk seek time. If your tables have secondary 
indexes where the insertion order is random, a modern disk, in combination 

with the InnoDB insert buffer, can insert maybe 200 random records per 
second. That is 100 rows/s for a typical table. You are going to insert 
200 
000 rows/s. You may need a disk farm of 4000 physical disks. Such disk 
farms 
exist today, but they are expensive, and we have no experience how Linux 
performs on them. Probably by 2012, Linux is good enough, if not yet 
today.

If you insert rows in large batches to tables smaller than your main 
memory, 
or if you insert in the prder of the primary key, and you do not have 
secondary keys, then there are no random accesses to disks, and you do not 

need a disk farm.

A typical disk in 2012 may store 1 TB. Thus, you will need at least 600 
disks anyway.

How long does it take to build an index to a 64 TB table if you have 6 TB 
of 
memory? If the index completely fits in the memory, then this is 
sequential 
disk I/O. With today's high end disks, you can read 60 MB/s. Building an 
index with a single disk would take 2 weeks. In 2012, it might take only 3 

days.

Conclusion: MySQL/InnoDB is able to handle that workload of 600 TB/year in 

year 2012. But you will need a huge server which has 10 x the memory of a 
high-end server, and 600 - 4000 physical disk drives.

The following link describes a system with 512 GB of memory, and 2000 disk 

drives:
http://www.tpc.org/results/individual_results/IBM/IBM_690_040217_es.pdf
The system costs 5.6 million US dollars.

> Best regards,
> Jacek Becla
> Stanford University

Best regards,

Heikki Tuuri
Innobase Oy
Foreign keys, transactions, and row level locking for MySQL
InnoDB Hot Backup - a hot backup tool for InnoDB which also backs up 
MyISAM 
tables
http://www.innodb.com/order.php

Order MySQL technical support from https://order.mysql.com/ 


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]





Re: [OT] HP DL760G2 vs Opteron for large DB on Linux

2004-10-04 Thread Udi . S . Karni
It's a great server (the DL760 G2). They really opened up the bus and the 
Xeon 3.0 is superfast.






"darren" <[EMAIL PROTECTED]>
10/03/2004 07:51 PM

 
To: [EMAIL PROTECTED]
cc: 
Subject:[OT] HP DL760G2 vs Opteron for large DB on Linux


Hi all,

I am looking to purchase hardware for a large database running MySQL or 
Oracle on Linux.

Was thinking of the Opteron for its good scalability and memory bandwidth 
until my friend told me that the DL760 from HP that uses the F8 (from 
Intel Profusion) chipset allowing up to 8 CPUs and memeory up to 32GB.

I have always though that the Xeons cannot go beyond 4 CPUs and 4GB but 
this changes the equation.

Has anyone got experience with this server or similar ones? How is the 
performance for running databases, esp in terms of large RAM usage?


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]





Re: Mysql growing pains, 4 days to create index on one table!

2004-07-16 Thread Udi . S . Karni
Here's another option to load without requiring a primary key (requires a 
LOT of extra disk space and fast CPU, and a batch window to run).

Load the new daily data into the table without checking for dupes.

Then create a new version of the table with distinct values.

Something like this (assuming your table has 2 columns which are char (3) 
and char (5) for simplicity's sake) in pseudocode:
(remember to pad all columns to the maximum width to make them uniform, 
right spaces on char, left zeroes on numerics)


CREATE NEW_TABLE AS
SELECT SUBSTR (DISTINCT (RPAD(COL1,3) || RPAD(COL2,5)),1,3) AS COL1,
   SUBSTR (DISTINCT (RPAD(COL1,3) || RPAD(COL2,5)),4,5) AS COL2
  FROM OLD_TABLE









matt ryan <[EMAIL PROTECTED]>
07/16/2004 07:43 AM

 
To: 
cc: [EMAIL PROTECTED]
Subject:Re: Mysql growing pains, 4 days to create index on one table!


Donny Simonton wrote:

>Matt,
>I've been reading this thread for a while and at this point, I would say
>that you would need to provide the table structures and queries that you 
are
>running.
>
>For example, we have one table that has 8 billion rows in it and it close 
to
>100 gigs and we can hammer it all day long without any problems.  It 
really
>depends on how you are doing things.
>
>But as far as you mentioning about mysql not using multiple indexes, it
>does.  You just have to create an index on multiple fields at one time.
>I've got tables with 10 fields in one index, now the trick with mysql is
>that you must use all top 10 fields in your where clause for mysql to 
really
>take advantage of the index.
>
>But I would definitely send the list your table structure with your 
indexes
>and some of your selects and inserts.  You can always change the names of
>things if you don't want people to know the names of everything.
>
>Just my 2 cents.
> 
>

Original email has the table structure, query speed is not an issue (it 
is, but I will deal with that later)

the issue is insert speed, I get 150k-1M records daily, of these, only 
5-10 % are new records, I load the EBCDIC file into a temp table, and 
then do "insert ignore into historytable select * from temp table"

and I cant create multiple field indexes, I would need 100 indexes on a 
table, the index side already exceeds the table size, I tried to add an 
index on date, but gave up because it ran for 2 days and was not done yet.



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]





Re: Mysql growing pains, 4 days to create index on one table!

2004-07-15 Thread Udi . S . Karni
Reordering the primary key wouldn't necessarily speed up the key check. 
The reason for selecting a particular sequence within a primary key is to 
put the columns that are available the most often - upfront - so that the 
index will have at least something to bite on.

Can you parittion further? You want to shoot for reading no more than a 1% 
slice for a given query. If possible.

Can you partition by ranges of a column? Values 01 to 100 in 
partition 1, 11 to 200 in partition 2, etc? Anything that's 
specified consistently in every query?

Then again -
if your table is huge...
and your queries are all over the place...
and the user can specify any search criteria...
and there's no predictable pattern you can partition on...
and you frequently add and/or remove lots of data from your table...
and your queries pull lots of rows...

You might be out of luck with MySQL ... sorry.

You may need to switch to a database that has a parallel query facility. 
Then - every query becomes a massive table scan but gets divided into 
multiple concurrent subqueries - and overall the job finishes in a 
reasonable amount of time. The epitomy of brute force. It's hard to 
rationalize initially but after a while you see it's the only way to go. 
Remember -  indexes are no longer required.

We have a billion row 100GB table the users search any and every way. 
Response time is less than a minute.

We are anxiously waiting to see this technology added to MySQL. Maybe one 
day we'll have some money to contribute to the effort. Parallel query is 
not trivial. That's why these databases are expensive.

I can send you more details privately if you are interested.






matt ryan <[EMAIL PROTECTED]>
07/14/2004 12:27 PM

 
To: [EMAIL PROTECTED]
cc: 
Subject:Re: Mysql growing pains, 4 days to create index on one table!


[EMAIL PROTECTED] wrote:

>You may want more indexes but you might be getting killed because you 
already have too many.
>
>To test - try loading into a table without indexes and see if it makes a 
difference.
>
>At the very least - check to see if the primary index which starts with 
'dic' can make your special 'dic' index superfluous.
>
>If write speed is a bottleneck you might consider Raid-1 instead of 
Raid-5.
>
>Reading lots of rows via index is a killer. Depending on your hardware it 
may be cheaper to table scan 50 rows than to read 1 via index. However, 
this requires partitioning of the data based on some column which appears 
in every query and acts as an initial filter. If you are lucky enough to 
be in that situation - consider a MERGE table.
>
>
> 
>

These tables are merged, the total table size is huge, on this 
particular table, it's , 45,449,534 rows, however, all the merge tables 
combined are 258,840,305 records

perhaps I should reorder the pimary key, putting the longest most unique 
record up front, and the least unique at the end, would that speed up 
the key check?   I can tell that almost everything is read IO, very 
little write IO

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]






Re: Mysql growing pains, 4 days to create index on one table!

2004-07-15 Thread Udi . S . Karni
That's the whole point. Eliminate your indexes and your load problems are 
solved. Especially given the fact that you insert ignore and don't use the 
primary key to validate uniqueness.






matt ryan <[EMAIL PROTECTED]>
07/15/2004 11:38 AM

 
To: 
cc: [EMAIL PROTECTED]
Subject:Re: Mysql growing pains, 4 days to create index on one table!



>
> You might be out of luck with MySQL ... sorry.
>
> You may need to switch to a database that has a parallel query 
> facility. Then - every query becomes a massive table scan but gets 
> divided into multiple concurrent subqueries - and overall the job 
> finishes in a reasonable amount of time. The epitomy of brute force. 
> It's hard to rationalize initially but after a while you see it's the 
> only way to go. Remember -  indexes are no longer required.
>
> We have a billion row 100GB table the users search any and every way. 
> Response time is less than a minute.
>
> We are anxiously waiting to see this technology added to MySQL. Maybe 
> one day we'll have some money to contribute to the effort. Parallel 
> query is not trivial. That's why these databases are expensive.
>
> I can send you more details privately if you are interested.


I've used it, with oracle, but oracles index searches are better, hit 
the best one first, then 2nd best, then 3rd, but I really dont want to 
go to oracle, it's too complicated for my tech's

vs mysql, hit the best one first, and use no other

Query time is a non issue at this point, it's load time, load daily file 
into temp table, then insert ignore into main table, on key violation 
the violating record is ignored

load time is the issue, the server loads files 15 hours a day, that big 
primary key makes loading any table over 2-3 gig VERY slow

I thought it was a bug, everything was great untill you get up to about 
3 or 4 gig, then it gets SLOW

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]