Hi,
Having trouble with indexing TEXT column types, for the purpose of a basic
search engine (without match/against). I've got the following table
structures, the second being the index table containing an entry for each
word in the TEXT (from the first table), minus stop-list type words. The
column in the second is an int as I'm md5()ing the word, taking the first 8
chars and converting from hex to decimal, (faster?).
CREATE TABLE articles (
article_id smallint(5) unsigned NOT NULL auto_increment,
article_text text NOT NULL,
PRIMARY KEY (article_id)
) TYPE=MyISAM;
CREATE TABLE articles_word_index (
article_id smallint(5) unsigned NOT NULL default '0',
word int(11) NOT NULL default '0',
KEY article_id_idx (article_id),
KEY word_idx (word)
) TYPE=MyISAM;
Problem is, using this table structure and the following sort of query:
SELECT DISTINCT a.article_id,
article_text,
COUNT(a.article_id) AS hits
FROM articles a,
articles_word_index b
WHERE a.article_id = b.article_id
AND b.word = 12345789
GROUP BY a.article_id
Is turning out to be about 3 times slower than a LIKE based query like so:
SELECT article_text
FROM articles
WHERE article_text LIKE '%protocol%'
Am I missing something? Or is it due to my limited dataset (~20 articles in
the TEXT field, indexing ~15,000 words). Ie. if the dataset was scaled up to
1000s would the join still be as slow?
Cheers!
--
(A rather drunken) Richard Heyes
http://www.heyes-computing.net/
---------------------------------------------------------------------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)
To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php