Hi,

I am using MySQL for a textual database in portuguese (brasilian).

When I use the FTS index, the search behavior is strange, and when I dump
the index using the fts_dump, there are several words that repeat itself in
the index.

For my surprise, the performance of the FTS for boolean search was too low
for the application that I am constructing, and I decided to implement a
full FTS engine using MySQL MyIsam binary search capacibilities.

When I start to construct the dictionary table, and looked inside, I had the
same surprise, the behavior is still erratic. This behavior occoured in the
3.23 and 4.1 (both last versions tested - March 6th).

Let me exemplify:

There are a simple dictionary table, that have unique words associated with
a numerical index, like:

TABLE dictionary
( id         int         primary_key autoincrement not null,
  word    char(50) )

First I inserted non equal words, like: SÃO, SAO and SAÕ, that returns
discret ids, like:

+--------+-------+
| id           | word   |
+--------+-------+
|     76     | são      |
| 116223 | sao      |
| 222943 | saõ      |
+--------+-------+


But when I do a select for specifcly word like sao, I get all the three !!!

Internally MySQL isn't distinguing the different words, assuming that à = A,
and O = Õ.

I read the manual, and I have a clue that this is a CHARACTER SET, COLLATE
behavior, is this right ? I am using the default CHARACTER SET and
COLLATION.

Thanks
Slepetys



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to