Re: [GENERAL] [to_tsvector] German Compound Words

Sven R. Kunze Thu, 28 May 2015 08:35:04 -0700

Sure. Here you are:

=# select ts_debug('public.german_compound', 'wasserkraft');
ts_debug
-----------------------------------------------------------------------------------------------------

(asciiword,"Word, allASCII",wasserkraft,"{german_hunspell,german_stem}",german_stem,{wasserkraft})


=# select ts_debug('public.german_compound', 'schifffahrt');
ts_debug
---------------------------------------------------------------------------------------------------------

(asciiword,"Word, allASCII",schifffahrt,"{german_hunspell,german_stem}",german_hunspell,{schifffahrt})


=# select ts_debug('public.german_compound', 'blindflansch');
ts_debug
-------------------------------------------------------------------------------------------------------

(asciiword,"Word, allASCII",blindflansch,"{german_hunspell,german_stem}",german_stem,{blindflansch})


That is my testing configuration:

=# \dF+ german_compound
Text search configuration "public.german_compound"
Parser: "pg_catalog.default"
      Token      |        Dictionaries
-----------------+-----------------------------
 asciihword      | german_hunspell,german_stem
 asciiword       | german_hunspell,german_stem
 email           | simple
 file            | simple
 float           | simple
 host            | simple
 hword           | german_hunspell,german_stem
 hword_asciipart | german_hunspell,german_stem
 hword_numpart   | simple
 hword_part      | german_hunspell,german_stem
 int             | simple
 numhword        | simple
 numword         | simple
 sfloat          | simple
 uint            | simple
 url             | simple
 url_path        | simple
 version         | simple
 word            | german_hunspell,german_stem

On 28.05.2015 17:24, Oleg Bartunov wrote:

ts_debug() ?

=# select * from ts_debug('english', 'messages');

-----------+-----------------+----------+----------------+--------------+----------

asciiword | Word, all ASCII | messages | {english_stem} |english_stem | {messag}

On Thu, May 28, 2015 at 2:05 PM, Sven R. Kunze <[email protected]<mailto:[email protected]>> wrote:


    Hi everybody,

    what do I need to do in order to enable compound word handling in
    PostgreSQL tsvector implementation?

    I run an Ubuntu 14.04 machine, PostgreSQL 9.3, have installed
    package hunspell-de-de and already created a new dictionary as
    described here:
    
http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY

    CREATE TEXT SEARCH DICTIONARY german_hunspell (
        TEMPLATE = ispell,
        DictFile = de_de,
        AffFile = de_de,
        StopWords = german
    );

    Furthermore, created a new test text search configuration (copied
    from german) and updated all parser parts where the german_stem
    dictionary is used so that it uses german_hunspell first and then
    german_stem.

    However, ts_vector still does not work for the compound words such as:

    wasserkraft -> wasserkraft, kraft
    schifffahrt -> schifffahrt, fahrt
    blindflansch -> blindflansch, flansch

    etc.


    What have I done wrong here?

--Sven R. Kunze

    TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
    Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
    e-mail: [email protected] <mailto:[email protected]>
    web: www.tbz-pariv.de <http://www.tbz-pariv.de>

    Geschäftsführer: Dr. Reiner Wohlgemuth
    Sitz der Gesellschaft: Chemnitz
    Registergericht: Chemnitz HRB 8543

--Sent via pgsql-general mailing list ([email protected]

    <mailto:[email protected]>)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general



--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: [email protected]
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543

Re: [GENERAL] [to_tsvector] German Compound Words

Reply via email to