Josh,

To cover numbers, it looks like you just need to add dictionaries (I
probably wouldn't use just one for everything) for uint, etc.  Note,
you can stack dictionaries.

As for & (along with |, !, and maybe parens), it may be best to simply
map those to some well-known token in search_normalize() that's very
unlikely to be used in the real world.  Perhaps some unicode
codepoint, like ☃ and friends.  Those are special characters used by
tsearch itself.

HTH,
--
Mike Rylander
 | President
 | Equinox Open Library Initiative
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@equinoxinitiative.org
 | web:  http://equinoxinitiative.org


On Thu, May 25, 2017 at 11:05 AM, Josh Stompro
<stomp...@exchange.larl.org> wrote:
> Hello, I’ve followed the steps in the following wiki pages to enable a
> synonym dictionary but I’m not getting the results I expect.
>
>
>
> https://wiki.evergreen-ils.org/doku.php?id=scratchpad:brush_up_search#synonym_dictionary
>
>
>
> Spelled out numbers do get translated to digits (six -> 6) but digits don’t
> get translated ( 6 -> six).
>
>
>
> When I test the synonym dictionary with something like the following it
> looks like it works:
>
> select ts_lexize('synonym_larl', '6');
>
> ts_lexize
>
> -----------
>
> {six}
>
> (1 row)
>
>
>
> But when I look at the the metabib.title_field_entry for a record that has
> been reindexed I see the following.
>
> select * from metabib.title_field_entry where source=102449 limit 100;
>
>    id    | source | field |                          value
> |
> index_vector
>
> ---------+--------+-------+----------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 2402931 | 102449 |     6 | Little house on the prairie Season 6 [disc 2]
> test seven | '2':9A,13C,20C '6':7A,12C,18C '7':14C 'disc':8A,19C 'hous':13C
> 'house':2A 'littl':12C 'little':1A 'on':3A,14C 'prairi':16C 'prairie':5A
> 'season':6A,17C 'seven':11A,22C 'test':10A,21C 'the':4A,15C
>
>
>
> Seven gets added as ‘seven’ and ‘7’, but the ‘2’ and ‘6’ do not.
>
>
>
> So I’m wondering if the search configuration needs to cover numeric tokens
> to make that work?
>
>
>
> select * from ts_debug('synonym_larl', '6');
>
> alias |   description    | token | dictionaries | dictionary | lexemes
>
> -------+------------------+-------+--------------+------------+---------
>
> uint  | Unsigned integer | 6     | {simple}     | simple     | {6}
>
>
>
> \dF+ synonym_larl;
>
> Text search configuration "public.synonym_larl"
>
> Parser: "pg_catalog.default"
>
>       Token      | Dictionaries
>
> -----------------+--------------
>
> asciihword      | synonym_larl
>
> asciiword       | synonym_larl
>
> email           | simple
>
> file            | simple
>
> float           | simple
>
> host            | simple
>
> hword           | simple
>
> hword_asciipart | synonym_larl
>
> hword_numpart   | simple
>
> hword_part      | simple
>
> int             | simple
>
> numhword        | simple
>
> numword         | simple
>
> sfloat          | simple
>
> uint            | simple
>
> url             | simple
>
> url_path        | simple
>
> version         | simple
>
> word            | simple
>
>
>
> Maybe the uint token needs to be set to synonym_larl also? But I’m wondering
> if this has bad side effects?
>
>
>
> Also, another mapping we would like to make is ‘&’ -> ‘and’ , ‘and’ -> ‘&’.
> But it doesn’t look like tsearch knows how to categorize ‘&’ as a token.
>
>
>
> select * from ts_debug('synonym_larl', '&');
>
> alias |  description  | token | dictionaries | dictionary | lexemes
>
> -------+---------------+-------+--------------+------------+---------
>
> blank | Space symbols | &     | {}           |            |
>
>
>
> Works fine going the other way and the ‘&’ ends up in the index.
>
>
>
> select * from ts_debug('synonym_larl', 'and');
>
>    alias   |   description   | token |  dictionaries  |  dictionary  |
> lexemes
>
> -----------+-----------------+-------+----------------+--------------+---------
>
> asciiword | Word, all ASCII | and   | {synonym_larl} | synonym_larl | {&}
>
>
>
> Thanks
>
> Josh
>
>
>
>
>
> Lake Agassiz Regional Library - Moorhead MN larl.org
>
> Josh Stompro     | Office 218.233.3757 EXT-139
>
> LARL IT Director | Cell 218.790.2110
>
>

Reply via email to