[HACKERS] integrated tsearch has different results than tsearch2

Pavel Stehule Mon, 03 Sep 2007 00:29:21 -0700

Hello

I am testing fulltext.


1. I am not able use fulltext with latin2 encoding :( I missing note
about only utf8 dictionaries in doc).


2. with hspell dictionaries (fresh copy from open office) I got
different and wrong results.

Original (old) result

ts=# select * from ts_debug('Příliš žluťoučký kůň se napil žluté vody');
    ts_name    | tok_type | description |   token   |     dict_name
  |  tsvector
 --------------+----------+-------------+-----------+
-------------------+ ------------
 default_czech | word     | Word        | Příliš    |
{cz_ispell,simple} | 'příliš'
 default_czech | word     | Word        | žluťoučký |
{cz_ispell,simple} | 'žluťoučký'
 default_czech | word     | Word        | kůň       | {cz_ispell,simple} | 'kůň'
 default_czech | lword    | Latin word  | se        | {cz_ispell,simple} |
 default_czech | lword    | Latin word  | napil     |
{cz_ispell,simple} | 'napít'
 default_czech | word     | Word        | žluté     |
{cz_ispell,simple} | 'žlutý'
 default_czech | lword    | Latin word  | vody      |
{cz_ispell,simple} | 'voda'
 (7 řádek)

New results:
postgres=# create Text search dictionary cspell(template=ispell,
dictfile=czech, afffile=czech, stopwords=czech);
CREATE TEXT SEARCH DICTIONARY
postgres=# CREATE text search configuration cs (copy=english);
CREATE TEXT SEARCH CONFIGURATION

postgres=# alter text search configuration cs alter mapping for word,
lword  with cspell, simple;
ALTER TEXT SEARCH CONFIGURATION
postgres=# select * from ts_debug('cs','Příliš žluťoučký kůň se napil
žluté vody');
 Alias |  Description  |   Token   |  Dictionaries   |    Lexized token
-------+---------------+-----------+-----------------+---------------------
 word  | Word          | Příliš    | {cspell,simple} | cspell: {příliš}
 blank | Space symbols |           | {}              |
 word  | Word          | žluťoučký | {cspell,simple} | cspell: {žluťoučký}
 blank | Space symbols |           | {}              |
 word  | Word          | kůň       | {cspell,simple} | cspell: {kůň}
 blank | Space symbols |           | {}              |
 lword | Latin word    | se        | {cspell,simple} | cspell: {}
 blank | Space symbols |           | {}              |
 lword | Latin word    | napil     | {cspell,simple} | simple: {napil}
 blank | Space symbols |           | {}              |
 word  | Word          | žluté     | {cspell,simple} | simple: {žluté}
 blank | Space symbols |           | {}              |
 lword | Latin word    | vody      | {cspell,simple} | simple: {vody}
(13 rows)

This query returned true in 8.2 and now:

postgres=# select to_tsvector('cs','Příliš žlutý kůň se napil žluté
vody') @@ to_tsquery('cs','napít');
 ?column?
----------
 f
(1 row)

Regards
Pavel Stehule

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

[HACKERS] integrated tsearch has different results than tsearch2

Reply via email to