I actually wanted to minimize the installation effort. Thus, I used the hunspell-de-de package of Debian/Ubuntu.

Give me a second for ispell.

Below, see the hunspell variant for Produktionsintervall/Produktionintervall:

=# select * from ts_debug('public.german_compound', 'Produktionsintervall');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+----------------------+-------------------------------+-------------+------------------------
asciiword | Word, all ASCII | Produktionsintervall | {german_hunspell,german_stem} | german_stem | {produktionsintervall}
(1 row)

=# select * from ts_debug('public.german_compound', 'Produktionintervall');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+---------------------+-------------------------------+-------------+-----------------------
asciiword | Word, all ASCII | Produktionintervall | {german_hunspell,german_stem} | german_stem | {produktionintervall}



PS: I post your answer to the list as well

On 28.05.2015 19:42, Oleg Bartunov wrote:
For readability it's better to use

select * from ts_debug

I remember there is problem with correct support of hunspell files. Did you try ispell files ?
Also, I found this 
messagehttp://www.postgresql.org/message-id/dm1ece$2gb5$1...@news.hub.org

Try this word - Produktionintervall


On Thu, May 28, 2015 at 6:34 PM, Sven R. Kunze <srku...@tbz-pariv.de <mailto:srku...@tbz-pariv.de>> wrote:

    Sure. Here you are:

    =# select ts_debug('public.german_compound', 'wasserkraft');
    ts_debug
    
-----------------------------------------------------------------------------------------------------
     (asciiword,"Word, all
    
ASCII",wasserkraft,"{german_hunspell,german_stem}",german_stem,{wasserkraft})

    =# select ts_debug('public.german_compound', 'schifffahrt');
    ts_debug
    
---------------------------------------------------------------------------------------------------------
     (asciiword,"Word, all
    
ASCII",schifffahrt,"{german_hunspell,german_stem}",german_hunspell,{schifffahrt})

    =# select ts_debug('public.german_compound', 'blindflansch');
    ts_debug
    
-------------------------------------------------------------------------------------------------------
     (asciiword,"Word, all
    
ASCII",blindflansch,"{german_hunspell,german_stem}",german_stem,{blindflansch})

    That is my testing configuration:

    =# \dF+ german_compound
    Text search configuration "public.german_compound"
    Parser: "pg_catalog.default"
          Token      |        Dictionaries
    -----------------+-----------------------------
     asciihword      | german_hunspell,german_stem
     asciiword       | german_hunspell,german_stem
     email           | simple
     file            | simple
     float           | simple
     host            | simple
     hword           | german_hunspell,german_stem
     hword_asciipart | german_hunspell,german_stem
     hword_numpart   | simple
     hword_part      | german_hunspell,german_stem
     int             | simple
     numhword        | simple
     numword         | simple
     sfloat          | simple
     uint            | simple
     url             | simple
     url_path        | simple
     version         | simple
     word            | german_hunspell,german_stem


    On 28.05.2015 17:24, Oleg Bartunov wrote:
    ts_debug() ?

    =# select * from ts_debug('english', 'messages');
alias | description | token | dictionaries | dictionary | lexemes
    
-----------+-----------------+----------+----------------+--------------+----------
     asciiword | Word, all ASCII | messages | {english_stem} |
    english_stem | {messag}


    On Thu, May 28, 2015 at 2:05 PM, Sven R. Kunze
    <srku...@tbz-pariv.de <mailto:srku...@tbz-pariv.de>> wrote:

        Hi everybody,

        what do I need to do in order to enable compound word
        handling in PostgreSQL tsvector implementation?

        I run an Ubuntu 14.04 machine, PostgreSQL 9.3, have installed
        package hunspell-de-de and already created a new dictionary
        as described here:
        
http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY

        CREATE TEXT SEARCH DICTIONARY german_hunspell (
            TEMPLATE = ispell,
            DictFile = de_de,
            AffFile = de_de,
            StopWords = german
        );

        Furthermore, created a new test text search configuration
        (copied from german) and updated all parser parts where the
        german_stem dictionary is used so that it uses
        german_hunspell first and then german_stem.

        However, ts_vector still does not work for the compound words
        such as:

        wasserkraft -> wasserkraft, kraft
        schifffahrt -> schifffahrt, fahrt
        blindflansch -> blindflansch, flansch

        etc.


        What have I done wrong here?

-- Sven R. Kunze
        TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
        Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
        e-mail: srku...@tbz-pariv.de <mailto:srku...@tbz-pariv.de>
        web: www.tbz-pariv.de <http://www.tbz-pariv.de>

        Geschäftsführer: Dr. Reiner Wohlgemuth
        Sitz der Gesellschaft: Chemnitz
        Registergericht: Chemnitz HRB 8543



-- Sent via pgsql-general mailing list
        (pgsql-general@postgresql.org
        <mailto:pgsql-general@postgresql.org>)
        To make changes to your subscription:
        http://www.postgresql.org/mailpref/pgsql-general




-- Sven R. Kunze
    TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
    Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
    e-mail:srku...@tbz-pariv.de  <mailto:srku...@tbz-pariv.de>
    web:www.tbz-pariv.de  <http://www.tbz-pariv.de>

    Geschäftsführer: Dr. Reiner Wohlgemuth
    Sitz der Gesellschaft: Chemnitz
    Registergericht: Chemnitz HRB 8543




--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srku...@tbz-pariv.de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543

Reply via email to