Folks, Just completed an interesting task utilizing mySQL 4.0.16. The database I'm creating is some summary information from the GenBank info from the NCBI. I must say that I am quite impressed by the performance that I am seeing. The data set is pretty substantial, consisting of almost 30M records. Yet it was loaded from the text file in less than 6 minutes. The text file itself is about 1.6GB in size.
An aggregation of the data also took just under a minute as shown in the information I've included with this message. I was also impressed by the time to select a particular record without the use of an index on the column being selected upon. I'm sure that would change once I create an index on this column. For information, this test was done on a 1300 MHz RH Linux 7.3 system with 896MB of memory, and WDC ATA drives. Not the top of the line machine, but pretty respectable. Brad Eacker ([EMAIL PROTECTED]) Particulars: mysql> create table gb_locus ( -> gbl_id int primary key, -> gbl_fileID int, -> gbl_locus varchar(20), -> gbl_size int, -> gbl_date date, -> gbl_phylum char(3), -> gbl_foffset int -> ); Query OK, 0 rows affected (0.00 sec) mysql> load data infile '/hda3/beacker/gene/genbank/gbl_locus.txt' -> into table gb_locus fields terminated by ','; Query OK, 29830869 rows affected (5 min 44.68 sec) Records: 29830869 Deleted: 0 Skipped: 0 Warnings: 0 Input file information: [EMAIL PROTECTED] genbank]$ ls -l gbl_locus.txt -rw-rw-r-- 1 beacker beacker 1583781135 Dec 4 13:41 gbl_locus.txt Aggregation select: mysql> select gbl_phylum, count(*) from gb_locus group by gbl_phylum; +------------+----------+ | gbl_phylum | count(*) | +------------+----------+ | BCT | 210778 | | CON | 11472 | | EST | 18836635 | | GSS | 7585521 | | HTC | 148411 | | HTG | 68390 | | INV | 186924 | | MAM | 52858 | | PAT | 1345394 | | PHG | 2396 | | PLN | 368927 | | PRI | 302997 | | ROD | 115600 | | STS | 257403 | | SYN | 10988 | | UNA | 1093 | | VRL | 203738 | | VRT | 121344 | +------------+----------+ 18 rows in set (59.74 sec) [EMAIL PROTECTED] gene]# ls -l gb_locus* -rw-rw---- 1 mysql mysql 8766 Dec 4 13:57 gb_locus.frm -rw-rw---- 1 mysql mysql 1075530216 Dec 4 14:04 gb_locus.MYD -rw-rw---- 1 mysql mysql 244406272 Dec 4 14:04 gb_locus.MYI Single record selections: mysql> select * from gb_locus where gbl_id = 14000000; +----------+------------+-----------+----------+------------+------------+-------------+ | gbl_id | gbl_fileID | gbl_locus | gbl_size | gbl_date | gbl_phylum | gbl_foffset | +----------+------------+-----------+----------+------------+------------+-------------+ | 14000000 | 310212 | AL556818 | 1027 | 2003-05-31 | EST | 41517291 | +----------+------------+-----------+----------+------------+------------+-------------+ 1 row in set (0.04 sec) mysql> select * from gb_locus where gbl_locus = 'AL556818'; +----------+------------+-----------+----------+------------+------------+-------------+ | gbl_id | gbl_fileID | gbl_locus | gbl_size | gbl_date | gbl_phylum | gbl_foffset | +----------+------------+-----------+----------+------------+------------+-------------+ | 14000000 | 310212 | AL556818 | 1027 | 2003-05-31 | EST | 41517291 | +----------+------------+-----------+----------+------------+------------+-------------+ 1 row in set (45.69 sec) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]