Folks,
Just completed an interesting task utilizing mySQL 4.0.16. The
database I'm creating is some summary information from the GenBank
info from the NCBI. I must say that I am quite impressed by the
performance that I am seeing. The data set is pretty substantial,
consisting of almost 30M records. Yet it was loaded from the text
file in less than 6 minutes. The text file itself is about 1.6GB
in size.
An aggregation of the data also took just under a minute as
shown in the information I've included with this message. I was also
impressed by the time to select a particular record without the use
of an index on the column being selected upon. I'm sure that would
change once I create an index on this column.
For information, this test was done on a 1300 MHz RH Linux 7.3
system with 896MB of memory, and WDC ATA drives. Not the top of the
line machine, but pretty respectable.
Brad Eacker ([EMAIL PROTECTED])
Particulars:
mysql> create table gb_locus (
-> gbl_id int primary key,
-> gbl_fileID int,
-> gbl_locus varchar(20),
-> gbl_size int,
-> gbl_date date,
-> gbl_phylum char(3),
-> gbl_foffset int
-> );
Query OK, 0 rows affected (0.00 sec)
mysql> load data infile '/hda3/beacker/gene/genbank/gbl_locus.txt'
-> into table gb_locus fields terminated by ',';
Query OK, 29830869 rows affected (5 min 44.68 sec)
Records: 29830869 Deleted: 0 Skipped: 0 Warnings: 0
Input file information:
[EMAIL PROTECTED] genbank]$ ls -l gbl_locus.txt
-rw-rw-r-- 1 beacker beacker 1583781135 Dec 4 13:41 gbl_locus.txt
Aggregation select:
mysql> select gbl_phylum, count(*) from gb_locus group by gbl_phylum;
+------------+----------+
| gbl_phylum | count(*) |
+------------+----------+
| BCT | 210778 |
| CON | 11472 |
| EST | 18836635 |
| GSS | 7585521 |
| HTC | 148411 |
| HTG | 68390 |
| INV | 186924 |
| MAM | 52858 |
| PAT | 1345394 |
| PHG | 2396 |
| PLN | 368927 |
| PRI | 302997 |
| ROD | 115600 |
| STS | 257403 |
| SYN | 10988 |
| UNA | 1093 |
| VRL | 203738 |
| VRT | 121344 |
+------------+----------+
18 rows in set (59.74 sec)
[EMAIL PROTECTED] gene]# ls -l gb_locus*
-rw-rw---- 1 mysql mysql 8766 Dec 4 13:57 gb_locus.frm
-rw-rw---- 1 mysql mysql 1075530216 Dec 4 14:04 gb_locus.MYD
-rw-rw---- 1 mysql mysql 244406272 Dec 4 14:04 gb_locus.MYI
Single record selections:
mysql> select * from gb_locus where gbl_id = 14000000;
+----------+------------+-----------+----------+------------+------------+-------------+
| gbl_id | gbl_fileID | gbl_locus | gbl_size | gbl_date | gbl_phylum | gbl_foffset
|
+----------+------------+-----------+----------+------------+------------+-------------+
| 14000000 | 310212 | AL556818 | 1027 | 2003-05-31 | EST | 41517291
|
+----------+------------+-----------+----------+------------+------------+-------------+
1 row in set (0.04 sec)
mysql> select * from gb_locus where gbl_locus = 'AL556818';
+----------+------------+-----------+----------+------------+------------+-------------+
| gbl_id | gbl_fileID | gbl_locus | gbl_size | gbl_date | gbl_phylum | gbl_foffset
|
+----------+------------+-----------+----------+------------+------------+-------------+
| 14000000 | 310212 | AL556818 | 1027 | 2003-05-31 | EST | 41517291
|
+----------+------------+-----------+----------+------------+------------+-------------+
1 row in set (45.69 sec)
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]