Hi!

>With mysqldump which uses the following query to do this:
>
>SELECT * INTO OUTFILE '/spylog/test/g00hits.txt' FROM g00hits
>
>It takes:
>real    1m23.177s   (first time)
>real    1m20.934s   (second time)

The dump speed is 45000 rows or 5 MB per second. I think best disks
today can write 10 MB/second. Thus for the dump the disk and the CPU
are about equal bottlenecks.

>mysql> load data infile '/spylog/test/g00hits.txt' ignore into table g00hits;
>Query OK, 3722867 rows affected (8 min 35.73 sec)
>
>mysql> load data infile '/spylog/test/g00hits.txt'  ignore into table g00hits;
>Query OK, 3722867 rows affected (3 min 35.26 sec)

What was the difference between the first run and the second run? Why is
the time in the second run only 1/2 of the first run?

In the second run you load 15000 rows per second, or
1.5 MB/s, which sounds a reasonable speed, since your rows are much
bigger than the rows in my test. Thus there is no performance bug
indicated by these numbers. The question is why your original test
took 40 minutes, though on a different computer.

In my test I was able to load 45000 8-byte rows per second,
which makes an input speed of 400 kB per second, much smaller than
the disk read speed. Thus we see that loading of data is CPU-bound
even on the fastest available processors, if we do not need to do
random disk accesses in building the indexes.

As a conclusion, are these numbers not satisfactory? You can dump
tables at about the same speed as the disk can write, and you can
import 50 million rows or 5 GB per hour.

Regards,

Heikki

At 09:38 PM 6/2/01 +0400, you wrote:
>Hello mysql,
>
>  I'm repeating my test's now with more information and on more
>  powerfull maching with no load now, this should give more accurate
>  results.
>  The machine is Dual PIII-750/1GB RAM
>
>  I'm trying the following table  structure:
>
>CREATE TABLE g00hits (
>  id int(10) unsigned NOT NULL auto_increment,
>  counter_id int(10) unsigned NOT NULL default '0',
>  visitor_id int(10) unsigned NOT NULL default '0',
>  server_id smallint(5) unsigned NOT NULL default '0',
>  ip int(10) unsigned NOT NULL default '0',
>  ts timestamp(14) NOT NULL,
>  method_id tinyint(3) unsigned NOT NULL default '0',
>  http_ver_id tinyint(3) unsigned NOT NULL default '0',
>  page_id int(10) unsigned NOT NULL default '0',
>  referer_page_id int(10) unsigned NOT NULL default '0',
>  status smallint(5) unsigned NOT NULL default '0',
>  bytes int(10) unsigned NOT NULL default '0',
>  browser_id mediumint(8) unsigned NOT NULL default '0',
>  language smallint(5) unsigned NOT NULL default '0',
>  local_visitor_id bigint(10) unsigned NOT NULL default '0',
>  process_time int(10) unsigned NOT NULL default '0',
>  proxy_software_id mediumint(8) unsigned NOT NULL default '0',
>  proxy_client_ip int(10) unsigned NOT NULL default '0',
>  auth_user_id int(10) unsigned NOT NULL default '0',
>  flag int(10) unsigned NOT NULL default '0',
>  session_id int(10) unsigned NOT NULL default '0',
>  doc_type smallint(5) unsigned NOT NULL default '0',
>  online_users smallint(5) unsigned NOT NULL default '0',
>  src_id int(10) unsigned NOT NULL default '0',
>  PRIMARY KEY  (counter_id,ts,id),
>  KEY visitor_id (visitor_id)
>) TYPE=MyISAM PACK_KEYS=1 DELAY_KEY_WRITE=1;
>
>which has:
>mysql> select count(*) from g00hits;
>+----------+
>| count(*) |
>+----------+
>|  3722867 |
>+----------+
>1 row in set (0.01 sec)
>
>MYD file takes 300MB, text output 400MB
>
>With mysqldump which uses the following query to do this:
>
>SELECT * INTO OUTFILE '/spylog/test/g00hits.txt' FROM g00hits
>
>It takes:
>real    1m23.177s   (first time)
>real    1m20.934s   (second time)
>
>The backup table for the same table takes:
>
>first time:
>mysql> backup table g00hits to '/spylog/test';
>+--------------+--------+----------+----------+
>| Table        | Op     | Msg_type | Msg_text |
>+--------------+--------+----------+----------+
>| la00.g00hits | backup | status   | OK       |
>+--------------+--------+----------+----------+
>1 row in set (22.60 sec)
>second time:
>
>mysql> backup table g00hits to '/spylog/test';
>+--------------+--------+----------+----------+
>| Table        | Op     | Msg_type | Msg_text |
>+--------------+--------+----------+----------+
>| la00.g00hits | backup | status   | OK       |
>+--------------+--------+----------+----------+
>1 row in set (21.34 sec)
>
>
>Well. We have 4 times difference here, which is a bit strange as I
>would expect the disk speed should be the limit here, but not
>converting the rows in text format, which is in this case done at
>speed of 45000rows/second, which is not really much I think for this
>type of operation. Hope mysql does not use strcat to form the strings?
>
>
>Well. Let's try to do the restore:
>mysql> restore table g00hits from '/spylog/test';
>+--------------+---------+----------+----------+
>| Table        | Op      | Msg_type | Msg_text |
>+--------------+---------+----------+----------+
>| test.g00hits | restore | status   | OK       |
>+--------------+---------+----------+----------+
>1 row in set (1 min 31.16 sec)
>
>and again:
>
>mysql> restore table g00hits from '/spylog/test';
>+--------------+---------+----------+----------+
>| Table        | Op      | Msg_type | Msg_text |
>+--------------+---------+----------+----------+
>| test.g00hits | restore | status   | OK       |
>+--------------+---------+----------+----------+
>1 row in set (1 min 30.41 sec)
>
>mysql> load data infile '/spylog/test/g00hits.txt' ignore into table g00hits;
>Query OK, 3722867 rows affected (8 min 35.73 sec)
>
>mysql> load data infile '/spylog/test/g00hits.txt'  ignore into table g00hits;
>Query OK, 3722867 rows affected (3 min 35.26 sec)
>
>So it seems like cache really matters here, this would indicate bad
>disk access way as this difference is to huge and is not found with
>backup/restore table.
>
>Now without ignore:
>
>mysql> load data infile '/spylog/test/g00hits.txt'  into table g00hits;
>Query OK, 3722867 rows affected (3 min 34.75 sec)
>
>So it seems like it does not really matter.
>
>
>Well let's drop all indexes and try to load data fastest way ?
>
>mysql> load data infile '/spylog/test/g00hits.txt'  ignore into table g00hits;
>Query OK, 3722867 rows affected (1 min 56.16 sec)
>Records: 3722867  Deleted: 0  Skipped: 0  Warnings: 0
>
>mysql> load data infile '/spylog/test/g00hits.txt'   into table g00hits;
>Query OK, 3722867 rows affected (1 min 56.38 sec)
>Records: 3722867  Deleted: 0  Skipped: 0  Warnings: 0
>
>
>mysql> alter table g00hits add primary key(counter_id,ts,id),add
key(visitor_id);
>Query OK, 3722867 rows affected (1 min 54.38 sec)
>Records: 3722867  Duplicates: 0  Warnings: 0
>
>So it does not give any improvement do add keys after loading date, it
>seems to be allready optimal.
>
>
>So the diference in restore speed is about 3 times, which is not
>really expected, but a bit better then I reported before, but anyway I
>can't understand so huge difference for load data then text file fits
>to cache or does not. Well may be it was because of some other random
>events - I was not able to reproduce so slow speed anymore.
>
>
>
>also look at the difference of vmstat between restore and load data:
>restore:
>
> 0  1  1 181624  42456 196488 206936   0   0  2823  4208 1126  1151   0  25  74
> 0  1  0 181624  24472 222220 199332   0   0  3411  3903 1144  1273   0  25  74
> 1  0  1 181624  30640 104856 268544   0   0  7030   371  682   792  29  24  47
> 1  0  0 181624  30636 104856 268544   0   0     0     0  104   115  50   0  50
> 1  0  0 181624  21108 104848 269308   0   0     0     0  104   114  50   0  49
>
> So the file is copied quickly and then indexes are generated so one
> CPU is completely out:
>
> LOAD DATA:
>
> 2  1  0 181624  75816 166460 180356   0   0   425   223  411  8798  29   6  65
> 0  1  0 181624  68256 168944 185312   0   0   435   407  608  2827  10   5  85
> 1  1  0 181624  63676 171668 189108   0   0   388   363  656  7937  19   7  74
> 1  0  0 181624  23976 104856 245228   0   0     0     0  106   116  50   0  50
> 1  0  0 181624  23976 104856 245228   0   0     0     0  104   113  50   0  49
> 1  0  0 181624  29980 131516 212568   0   0     0     0  104   114  48   3  50
>  
>
> as you see the CPU is not really used, therefore the disk access is
> not so huge, this may indicate the disk is bein accessed in random
> order, therefore it should not i think. This may indicate to small
> buffer used (?) my record_buffer is set to 512K by the way.
>
>
>
>
>
>
>
>-- 
>Best regards,
> Peter                          mailto:[EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to