Hi,

Having trouble with indexing TEXT column types, for the purpose of a basic
search engine (without match/against). I've got the following table
structures, the second being the index table containing an entry for each
word in the TEXT (from the first table), minus stop-list type words. The
column in the second is an int as I'm md5()ing the word, taking the first 8
chars and converting from hex to decimal, (faster?).

CREATE TABLE articles (
  article_id smallint(5) unsigned NOT NULL auto_increment,
  article_text text NOT NULL,
  PRIMARY KEY (article_id)
) TYPE=MyISAM;

CREATE TABLE articles_word_index (
  article_id smallint(5) unsigned NOT NULL default '0',
  word int(11) NOT NULL default '0',
  KEY article_id_idx (article_id),
  KEY word_idx (word)
) TYPE=MyISAM;

Problem is, using this table structure and the following sort of query:

SELECT DISTINCT a.article_id,
       article_text,
       COUNT(a.article_id) AS hits

  FROM articles a,
       articles_word_index b

 WHERE a.article_id = b.article_id
   AND b.word = 12345789

 GROUP BY a.article_id

Is turning out to be about 3 times slower than a LIKE based query like so:

SELECT article_text

  FROM articles

 WHERE article_text LIKE '%protocol%'

Am I missing something? Or is it due to my limited dataset (~20 articles in
the TEXT field, indexing ~15,000 words). Ie. if the dataset was scaled up to
1000s would the join still be as slow?

Cheers!
--
(A rather drunken) Richard Heyes
http://www.heyes-computing.net/


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to