Re: Large data set load and access

2003-12-05 Thread Chuck Gadd
[EMAIL PROTECTED] wrote:

 Thanks for the heads up on this.  Unfortunately the only
varchar is the gbl_locus field, so I'm not sure how much this
would by me for the space.
   Thanks again,
   Brad Eacker ([EMAIL PROTECTED])
No, it wouldn't save you any space.  It would make your file a
little bigger on the drive.  varchar is a variable length
field.  As you've got it now, if you store a 10 byte string in
that field, it'll only take up something like 14 bytes.   If
you store 1 character, it'll take up 5 bytes.  So you are
currently saving a little space.
BUT!  Variable length fields slow Mysql down a little bit when
it's reading the table, since Mysql needs to read in each
record completely to get to the next record.  If you have no
variable length fields, then mysql can calculate right away the
starting position of any record based on the fixed record size.






--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Large data set load and access

2003-12-04 Thread beacker
Folks,
 Just completed an interesting task utilizing mySQL 4.0.16.  The
database I'm creating is some summary information from the GenBank
info from the NCBI.  I must say that I am quite impressed by the
performance that I am seeing.  The data set is pretty substantial,
consisting of almost 30M records.  Yet it was loaded from the text
file in less than 6 minutes.  The text file itself is about 1.6GB
in size.

 An aggregation of the data also took just under a minute as
shown in the information I've included with this message.  I was also
impressed by the time to select a particular record without the use
of an index on the column being selected upon.  I'm sure that would
change once I create an index on this column.

 For information, this test was done on a 1300 MHz RH Linux 7.3
system with 896MB of memory, and WDC ATA drives.  Not the top of the
line machine, but pretty respectable.
   Brad Eacker ([EMAIL PROTECTED])

Particulars:
   
mysql create table gb_locus (
- gbl_id  int primary key,
- gbl_fileID  int,
- gbl_locus   varchar(20),
- gbl_sizeint,
- gbl_datedate,
- gbl_phylum  char(3),
- gbl_foffset int
- );
Query OK, 0 rows affected (0.00 sec)

mysql load data infile '/hda3/beacker/gene/genbank/gbl_locus.txt'
- into table gb_locus fields terminated by ',';
Query OK, 29830869 rows affected (5 min 44.68 sec)
Records: 29830869  Deleted: 0  Skipped: 0  Warnings: 0


Input file information:
[EMAIL PROTECTED] genbank]$ ls -l gbl_locus.txt
-rw-rw-r--1 beacker  beacker  1583781135 Dec  4 13:41 gbl_locus.txt


Aggregation select:

mysql select gbl_phylum, count(*) from gb_locus group by gbl_phylum;
++--+
| gbl_phylum | count(*) |
++--+
| BCT|   210778 |
| CON|11472 |
| EST| 18836635 |
| GSS|  7585521 |
| HTC|   148411 |
| HTG|68390 |
| INV|   186924 |
| MAM|52858 |
| PAT|  1345394 |
| PHG| 2396 |
| PLN|   368927 |
| PRI|   302997 |
| ROD|   115600 |
| STS|   257403 |
| SYN|10988 |
| UNA| 1093 |
| VRL|   203738 |
| VRT|   121344 |
++--+
18 rows in set (59.74 sec)

[EMAIL PROTECTED] gene]# ls -l gb_locus*  
-rw-rw1 mysqlmysql8766 Dec  4 13:57 gb_locus.frm
-rw-rw1 mysqlmysql1075530216 Dec  4 14:04 gb_locus.MYD
-rw-rw1 mysqlmysql244406272 Dec  4 14:04 gb_locus.MYI


Single record selections:

mysql select * from gb_locus where gbl_id = 1400;
+--++---+--+++-+
| gbl_id   | gbl_fileID | gbl_locus | gbl_size | gbl_date   | gbl_phylum | gbl_foffset 
|
+--++---+--+++-+
| 1400 | 310212 | AL556818  | 1027 | 2003-05-31 | EST|41517291 
|
+--++---+--+++-+
1 row in set (0.04 sec)

mysql select * from gb_locus where gbl_locus = 'AL556818';
+--++---+--+++-+
| gbl_id   | gbl_fileID | gbl_locus | gbl_size | gbl_date   | gbl_phylum | gbl_foffset 
|
+--++---+--+++-+
| 1400 | 310212 | AL556818  | 1027 | 2003-05-31 | EST|41517291 
|
+--++---+--+++-+
1 row in set (45.69 sec)


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Large data set load and access

2003-12-04 Thread Chuck Gadd
[EMAIL PROTECTED] wrote:

mysql create table gb_locus (
- gbl_id  int primary key,
- gbl_fileID  int,
- gbl_locus   varchar(20),
- gbl_sizeint,
- gbl_datedate,
- gbl_phylum  char(3),
- gbl_foffset int
- );
At a recent MySql class, I learned that using all fixed
length fields instead of variable length fields can improve
speed.  This is because MySql can skip thru records faster
if the records are all a fixed length.
So, if you were willing to give up a little storage space,
make the gbl_locus field a Char(20) instead of a varchar(20)
and see if it speeds things up.  I found noticable speed
increase in my selects doing this.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Large data set load and access

2003-12-04 Thread beacker
So, if you were willing to give up a little storage space,
make the gbl_locus field a Char(20) instead of a varchar(20)
and see if it speeds things up.  I found noticable speed
increase in my selects doing this.

 Thanks for the heads up on this.  Unfortunately the only
varchar is the gbl_locus field, so I'm not sure how much this
would by me for the space.
   Thanks again,
   Brad Eacker ([EMAIL PROTECTED])



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]