Folks,
     Just completed an interesting task utilizing mySQL 4.0.16.  The
database I'm creating is some summary information from the GenBank
info from the NCBI.  I must say that I am quite impressed by the
performance that I am seeing.  The data set is pretty substantial,
consisting of almost 30M records.  Yet it was loaded from the text
file in less than 6 minutes.  The text file itself is about 1.6GB
in size.

     An aggregation of the data also took just under a minute as
shown in the information I've included with this message.  I was also
impressed by the time to select a particular record without the use
of an index on the column being selected upon.  I'm sure that would
change once I create an index on this column.

     For information, this test was done on a 1300 MHz RH Linux 7.3
system with 896MB of memory, and WDC ATA drives.  Not the top of the
line machine, but pretty respectable.
                   Brad Eacker ([EMAIL PROTECTED])

Particulars:
           
mysql> create table gb_locus (
    ->         gbl_id          int primary key,
    ->         gbl_fileID      int,
    ->         gbl_locus       varchar(20),
    ->         gbl_size        int,
    ->         gbl_date        date,
    ->         gbl_phylum      char(3),
    ->         gbl_foffset     int
    -> );
Query OK, 0 rows affected (0.00 sec)

mysql> load data infile '/hda3/beacker/gene/genbank/gbl_locus.txt'
    -> into table gb_locus fields terminated by ',';
Query OK, 29830869 rows affected (5 min 44.68 sec)
Records: 29830869  Deleted: 0  Skipped: 0  Warnings: 0


Input file information:
[EMAIL PROTECTED] genbank]$ ls -l gbl_locus.txt
-rw-rw-r--    1 beacker  beacker  1583781135 Dec  4 13:41 gbl_locus.txt


Aggregation select:

mysql> select gbl_phylum, count(*) from gb_locus group by gbl_phylum;
+------------+----------+
| gbl_phylum | count(*) |
+------------+----------+
| BCT        |   210778 |
| CON        |    11472 |
| EST        | 18836635 |
| GSS        |  7585521 |
| HTC        |   148411 |
| HTG        |    68390 |
| INV        |   186924 |
| MAM        |    52858 |
| PAT        |  1345394 |
| PHG        |     2396 |
| PLN        |   368927 |
| PRI        |   302997 |
| ROD        |   115600 |
| STS        |   257403 |
| SYN        |    10988 |
| UNA        |     1093 |
| VRL        |   203738 |
| VRT        |   121344 |
+------------+----------+
18 rows in set (59.74 sec)

[EMAIL PROTECTED] gene]# ls -l gb_locus*  
-rw-rw----    1 mysql    mysql        8766 Dec  4 13:57 gb_locus.frm
-rw-rw----    1 mysql    mysql    1075530216 Dec  4 14:04 gb_locus.MYD
-rw-rw----    1 mysql    mysql    244406272 Dec  4 14:04 gb_locus.MYI


Single record selections:

mysql> select * from gb_locus where gbl_id = 14000000;
+----------+------------+-----------+----------+------------+------------+-------------+
| gbl_id   | gbl_fileID | gbl_locus | gbl_size | gbl_date   | gbl_phylum | gbl_foffset 
|
+----------+------------+-----------+----------+------------+------------+-------------+
| 14000000 |     310212 | AL556818  |     1027 | 2003-05-31 | EST        |    41517291 
|
+----------+------------+-----------+----------+------------+------------+-------------+
1 row in set (0.04 sec)

mysql> select * from gb_locus where gbl_locus = 'AL556818';
+----------+------------+-----------+----------+------------+------------+-------------+
| gbl_id   | gbl_fileID | gbl_locus | gbl_size | gbl_date   | gbl_phylum | gbl_foffset 
|
+----------+------------+-----------+----------+------------+------------+-------------+
| 14000000 |     310212 | AL556818  |     1027 | 2003-05-31 | EST        |    41517291 
|
+----------+------------+-----------+----------+------------+------------+-------------+
1 row in set (45.69 sec)


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to