Folks,
Just completed an interesting task utilizing mySQL 4.0.16. The
database I'm creating is some summary information from the GenBank
info from the NCBI. I must say that I am quite impressed by the
performance that I am seeing. The data set is pretty substantial,
consisting of almost 30M records. Yet it was loaded from the text
file in less than 6 minutes. The text file itself is about 1.6GB
in size.
An aggregation of the data also took just under a minute as
shown in the information I've included with this message. I was also
impressed by the time to select a particular record without the use
of an index on the column being selected upon. I'm sure that would
change once I create an index on this column.
For information, this test was done on a 1300 MHz RH Linux 7.3
system with 896MB of memory, and WDC ATA drives. Not the top of the
line machine, but pretty respectable.
Brad Eacker ([EMAIL PROTECTED])
Particulars:
mysql create table gb_locus (
- gbl_id int primary key,
- gbl_fileID int,
- gbl_locus varchar(20),
- gbl_sizeint,
- gbl_datedate,
- gbl_phylum char(3),
- gbl_foffset int
- );
Query OK, 0 rows affected (0.00 sec)
mysql load data infile '/hda3/beacker/gene/genbank/gbl_locus.txt'
- into table gb_locus fields terminated by ',';
Query OK, 29830869 rows affected (5 min 44.68 sec)
Records: 29830869 Deleted: 0 Skipped: 0 Warnings: 0
Input file information:
[EMAIL PROTECTED] genbank]$ ls -l gbl_locus.txt
-rw-rw-r--1 beacker beacker 1583781135 Dec 4 13:41 gbl_locus.txt
Aggregation select:
mysql select gbl_phylum, count(*) from gb_locus group by gbl_phylum;
++--+
| gbl_phylum | count(*) |
++--+
| BCT| 210778 |
| CON|11472 |
| EST| 18836635 |
| GSS| 7585521 |
| HTC| 148411 |
| HTG|68390 |
| INV| 186924 |
| MAM|52858 |
| PAT| 1345394 |
| PHG| 2396 |
| PLN| 368927 |
| PRI| 302997 |
| ROD| 115600 |
| STS| 257403 |
| SYN|10988 |
| UNA| 1093 |
| VRL| 203738 |
| VRT| 121344 |
++--+
18 rows in set (59.74 sec)
[EMAIL PROTECTED] gene]# ls -l gb_locus*
-rw-rw1 mysqlmysql8766 Dec 4 13:57 gb_locus.frm
-rw-rw1 mysqlmysql1075530216 Dec 4 14:04 gb_locus.MYD
-rw-rw1 mysqlmysql244406272 Dec 4 14:04 gb_locus.MYI
Single record selections:
mysql select * from gb_locus where gbl_id = 1400;
+--++---+--+++-+
| gbl_id | gbl_fileID | gbl_locus | gbl_size | gbl_date | gbl_phylum | gbl_foffset
|
+--++---+--+++-+
| 1400 | 310212 | AL556818 | 1027 | 2003-05-31 | EST|41517291
|
+--++---+--+++-+
1 row in set (0.04 sec)
mysql select * from gb_locus where gbl_locus = 'AL556818';
+--++---+--+++-+
| gbl_id | gbl_fileID | gbl_locus | gbl_size | gbl_date | gbl_phylum | gbl_foffset
|
+--++---+--+++-+
| 1400 | 310212 | AL556818 | 1027 | 2003-05-31 | EST|41517291
|
+--++---+--+++-+
1 row in set (45.69 sec)
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]