My name is Konrad Lorincz, a graduate student in computer science at Harvard
University.  I am doing a class project in which I am comparing the
performance of queries between MySQL, grep, and FlatSQL (an SQL like
language
that we implemented using awk and perl, that performs queries on flat
files).  I am writing because I discovered a strange behavior in MySQL when
performing joins involving three columns over non-indexed data.

When the number of tuples is lager than a certain size, it take a very long
time (at first I thought it entered an infinite loop) for operation to
complete.  Our implementation FlatSQL on the other time completes in a
reasonable time.

I wonder if MySQL does something funny like try to build an index over the
non-indexed data that it's causing it to be so slow.  Please let me know if
you have any ideas of what's causing this behavior and if there is a way
around it.  Below, are some additional details


Here is a sample of the date.  It's a collection of NFS traces from the file
system.  The second column (time) is indexed in MySQL.  Each line is of the
form:

Op | Time | XID | ClientID | FH | Offset | Size
write 1003809600.416227 29caeba9 8cf7216b.03ce 1 229376 8192
write 1003809600.417535 2bcaeba9 8cf7216b.03ce 1 237568 1142
read  1003809601.072451 22897eaa 8cf7216c.03e8 2 0 8192

and the query is
SELECT a.XID, a.ClientID, b.Offset, c.Size
FROM   2_10__3_4 AS a, 2_10__3_6 AS b, 2_10__3_7 AS c
WHERE a.XID=b.XID AND a.XID=c.XID;

where 2_10__3_4 is a table with 2^10=1024 tuples (lines) of data having
columns 3 and 4


*** MySQL timing ***
NbrLinesInTable | Time to complete Joins (sec)
2_06 lines  0:00.08 sec
2_07 lines  0:00.12 sec
2_08 lines  0:00.36 sec
2_09 lines  2:26.41 sec
2_10 lines  20:16.59 sec

*** FlatSQL ***
2_06 lines  0:01.18 sec
2_07 lines  0:02.20 sec
2_08 lines  0:04.28 sec
2_09 lines  0:09.15 sec
2_10 lines  0:21.94 sec

As we can see, something strange happens when the table is larger than 2^8
lines (tuples) the time to complete jumped from 0.36sec to 2min 26 sec and
then to 20min 16sec.  For FlatSQL, we have a steady increase.


I created the tables in MySQL using the following
    my $TABLE_TO_DESC="CREATE TABLE ${TABLE_TO} (       \
        Op VARCHAR(10) NOT NULL default '',     \
        Time DECIMAL(15, 5),                    \
        XID VARCHAR(10) default NULL,           \
        ClientID VARCHAR(15) default NULL,      \
        FH INT UNSIGNED,                        \
        Offset INT UNSIGNED,                    \
        Size INT UNSIGNED,                      \
        KEY Time (Time)                         \
        ) TYPE=MyISAM";


MySQL version
mysql  Ver 11.18 Distrib 3.23.52, for pc-linux-gnu (i686)

I appreciate your help,
-Konrad


************************************************************
** Konrad Lorincz
** graduate student at
** Harvard University
**
** Tel:    (617) 493-4956
** E-Mail: [EMAIL PROTECTED]
** URL:    http://www.eecs.harvard.edu/~konrad
**
**
** "For here we are not afraid to follow truth
**  wherever it may lead, nor to tolerate any error
**  so long as reason is left free to combat it."
**                               - Thomas Jefferson
**
** ... to infinity and beyond ...
************************************************************



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to