Alright. I got it running and used ; specifically:

Not sure where to find up-to-date/authorized the ispell dictionaries. I figured that I need to change this particular dictionary in order to avoid "ion" being split aways from words like "produktION/konstruktION" etc:

=# select * from ts_debug('public.german_compound_ispell', 'konstruktion');+
alias | description | token | dictionaries | dictionary | lexemes
asciiword | Word, all ASCII | konstruktion | {german_ispell,german_stem} | german_ispell | {konstruktion,konstrukt,ion}

The splitting of compound words is unfortunately not consistent (wasserkraft vs konstruktionsplan):

=# select * from ts_debug('public.german_compound_ispell', 'wasserkraft');
alias | description | token | dictionaries | dictionary | lexemes
asciiword | Word, all ASCII | wasserkraft | {german_ispell,german_stem} | german_ispell | {wasserkraft,wasser,kraft}

=# select * from ts_debug('public.german_compound_ispell', 'konstruktionsplan'); alias | description | token | dictionaries | dictionary | lexemes
asciiword | Word, all ASCII | konstruktionsplan | {german_ispell,german_stem} | german_ispell | {konstruktion,plan}

Not sure how the 'sch' come to be:

=# select * from ts_debug('public.german_compound_ispell', 'rundflansch');
alias | description | token | dictionaries | dictionary | lexemes
asciiword | Word, all ASCII | rundflansch | {german_ispell,german_stem} | german_ispell | {rund,flansch,rund,flan,sch}

This is another funny example:

=# select * from ts_debug('public.german_compound_ispell', 'datenbanken');
alias | description | token | dictionaries | dictionary | lexemes
asciiword | Word, all ASCII | datenbanken | {german_ispell,german_stem} | german_ispell | {datenbank,daten,date,banken,daten,date,bank,daten,date,banken,daten,date,bank}

On 01.06.2015 09:25, Sven R. Kunze wrote:
I actually wanted to minimize the installation effort. Thus, I used the hunspell-de-de package of Debian/Ubuntu.

Give me a second for ispell.

Below, see the hunspell variant for Produktionsintervall/Produktionintervall:

=# select * from ts_debug('public.german_compound', 'Produktionsintervall'); alias | description | token | dictionaries | dictionary | lexemes
asciiword | Word, all ASCII | Produktionsintervall | {german_hunspell,german_stem} | german_stem | {produktionsintervall}
(1 row)

=# select * from ts_debug('public.german_compound', 'Produktionintervall'); alias | description | token | dictionaries | dictionary | lexemes
asciiword | Word, all ASCII | Produktionintervall | {german_hunspell,german_stem} | german_stem | {produktionintervall}

PS: I post your answer to the list as well

On 28.05.2015 19:42, Oleg Bartunov wrote:
For readability it's better to use

select * from ts_debug

I remember there is problem with correct support of hunspell files. Did you try ispell files ?
Also, I found this 

Try this word - Produktionintervall

On Thu, May 28, 2015 at 6:34 PM, Sven R. Kunze < <>> wrote:

    Sure. Here you are:

    =# select ts_debug('public.german_compound', 'wasserkraft');
     (asciiword,"Word, all

    =# select ts_debug('public.german_compound', 'schifffahrt');
     (asciiword,"Word, all

    =# select ts_debug('public.german_compound', 'blindflansch');
     (asciiword,"Word, all

    That is my testing configuration:

    =# \dF+ german_compound
    Text search configuration "public.german_compound"
    Parser: "pg_catalog.default"
          Token      |        Dictionaries
     asciihword      | german_hunspell,german_stem
     asciiword       | german_hunspell,german_stem
     email           | simple
     file            | simple
     float           | simple
     host            | simple
     hword           | german_hunspell,german_stem
     hword_asciipart | german_hunspell,german_stem
     hword_numpart   | simple
     hword_part      | german_hunspell,german_stem
     int             | simple
     numhword        | simple
     numword         | simple
     sfloat          | simple
     uint            | simple
     url             | simple
     url_path        | simple
     version         | simple
     word            | german_hunspell,german_stem

    On 28.05.2015 17:24, Oleg Bartunov wrote:
    ts_debug() ?

    =# select * from ts_debug('english', 'messages');
alias | description | token | dictionaries | dictionary | lexemes
     asciiword | Word, all ASCII | messages | {english_stem} |
    english_stem | {messag}

    On Thu, May 28, 2015 at 2:05 PM, Sven R. Kunze
    < <>> wrote:

        Hi everybody,

        what do I need to do in order to enable compound word
        handling in PostgreSQL tsvector implementation?

        I run an Ubuntu 14.04 machine, PostgreSQL 9.3, have
        installed package hunspell-de-de and already created a new
        dictionary as described here:

        CREATE TEXT SEARCH DICTIONARY german_hunspell (
            TEMPLATE = ispell,
            DictFile = de_de,
            AffFile = de_de,
            StopWords = german

        Furthermore, created a new test text search configuration
        (copied from german) and updated all parser parts where the
        german_stem dictionary is used so that it uses
        german_hunspell first and then german_stem.

        However, ts_vector still does not work for the compound
        words such as:

        wasserkraft -> wasserkraft, kraft
        schifffahrt -> schifffahrt, fahrt
        blindflansch -> blindflansch, flansch


        What have I done wrong here?

-- Sven R. Kunze
        TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
        Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
        e-mail: <>
        web: <>

        Geschäftsführer: Dr. Reiner Wohlgemuth
        Sitz der Gesellschaft: Chemnitz
        Registergericht: Chemnitz HRB 8543

-- Sent via pgsql-general mailing list
        To make changes to your subscription:

-- Sven R. Kunze
    TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
    Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920  <>  <>

    Geschäftsführer: Dr. Reiner Wohlgemuth
    Sitz der Gesellschaft: Chemnitz
    Registergericht: Chemnitz HRB 8543

Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543

Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543

Reply via email to