Re: [GENERAL] Tsearch vector not stored by update/set

Andrew J. Kopciuch Mon, 21 Mar 2005 15:54:54 -0800

> It seems to be selective of only numbers, words with numbers in them,
> words with '.' or '/' characters.  It completely ignores any other words
> or text in any of the 3 fields.
>


This is a very big hint to your problem.

> You requested the pg_ts_* tables:
> On the Linux-redhat, pg7.3.2
>
> pg_ts_cfgmap(73 rows)
> ts_name  tok_alias dict_name
> "default" "lword" "{en_stem}"
> "default" "nlword" "{simple}"
> "default" "word" "{simple}"
> "default" "email" "{simple}"
> "default" "url" "{simple}"
> "default" "host" "{simple}"
> "default" "sfloat" "{simple}"
> "default" "version" "{simple}"
> "default" "part_hword" "{simple}"
> "default" "nlpart_hword" "{simple}"
> "default" "lpart_hword" "{en_stem}"
> "default" "hword" "{simple}"
> "default" "lhword" "{en_stem}"
> "default" "nlhword" "{simple}"
> "default" "uri" "{simple}"
> "default" "file" "{simple}"
> "default" "float" "{simple}"
> "default" "int" "{simple}"
> "default" "uint" "{simple}"
> "default_russian" "lword"  "{en_stem}"
> "default_russian" "nlword" "{ru_stem}"
> "default_russian" "word" "{ru_stem}"
> "default_russian" "email" "{simple}"
> "default_russian" "url" "{simple}"
> "default_russian" "host" "{simple}"
> "default_russian" "sfloat" "{simple}"
> "default_russian" "version" "{simple}"
> "default_russian" "part_hword" "{simple}"
> "default_russian" "nlpart_hword" "{ru_stem}"
> "default_russian" "lpart_hword" "{en_stem}"
> "default_russian" "hword" "{ru_stem}"
> "default_russian" "lhword" "{en_stem}"
> "default_russian" "nlhword" "{ru_stem}"
> "default_russian" "uri" "{simple}"
> "default_russian" "file" "{simple}"
> "default_russian" "float" "{simple}"
> "default_russian" "int" "{simple}"
> "default_russian" "uint" "{simple}"
> "simple" "lword" "{simple}"
> "simple" "nlword" "{simple}"
> "simple" "word" "{simple}"
> "simple" "email" "{simple}"
> "simple" "url" "{simple}"
> "simple" "host" "{simple}"
> "simple" "sfloat" "{simple}"
> "simple" "version" "{simple}"
> "simple" "part_hword" "{simple}"
> "simple" "nlpart_hword" "{simple}"
> "simple" "lpart_hword" "{simple}"
> "simple" "hword" "{simple}"
> "simple" "lhword" "{simple}"
> "simple" "nlhword" "{simple}"
> "simple" "uri" "{simple}"
> "simple" "file" "{simple}"
> "simple" "float" "{simple}"
> "simple" "int" "{simple}"
> "simple" "uint" "{simple}"
> "default_english" "url" "{simple}"
> "default_english" "host" "{simple}"
> "default_english" "sfloat" "{simple}"
> "default_english" "uri" "{simple}"
> "default_english" "int" "{simple}"
> "default_english" "float" "{simple}"
> "default_english" "email" "{simple}"
> "default_english" "word" "{simple}"
> "default_english" "hword" "{simple}"
> "default_english" "nlword" "{simple}"
> "default_english" "nlpart_hword" "{simple}"
> "default_english" "part_hword" "{simple}"
> "default_english" "nlhword" "{simple}"
> "default_english" "file" "{simple}"
> "default_english" "uint" "{simple}"
> "default_english" "version" "{simple}"
>

I am assuming that your cluster is running created with en_US for the locale, 
and that you have set the matching tsearch2 configuration to be your default 
(Or curcfg for each process running).

If you look at your config mappings for the "default_english" you will notice 
that you have 16 records, as opposed to 19 records like every other 
configuration mapping.  From some more in depth observations, I noticed you 
are missing entries for the 'lword', 'lhword' and ''lpart_hword'.  That means 
that tokens found to be of types 'Latin Words', 'Latin Hyphenated Words' and 
'Latin Part Hyphenated Words' are just dropped because you do not have a 
configuration mapping set up for them.

This is why only numbers (or other lexem types) would show (They are returned 
as lexem_types : int, uint, float, url, etc. for which you have mappings).  
Most regular words are simply discarded due to missing entries.  If you fix 
your configurations the triggers should work properly.

Your examples worked before, simply because you specified the 'default' 
configuration on the insert statement.  Which is not the same as the 
'default_english' configuration which is used by the trigger based on your 
server encoding (en_US).

> I have made a single change to it from its default installation.  When I
> was working with the rank_cd() function on the 8.0.0 machine, it had
> errors due to a non-existant english stop file, so I changed
> pg_ts_dict.dict_initoption = '' where dict_name = 'en_stem'.  The indexing
> system was working fine both before and after the change to the pg_ts_dict
> table.  I also propagated the change to the 7.3.2 machine even though it
> didn't have the error message (the stop file didn't exist on that computer
> either, but it never gave an error message about it).

I would not recommend this.  The stop file should is most likely on the system 
somewhere.  It will change depending on your installation.  Look for 
english.stop on the computer(s).  If it is not there, you can grab the one 
out of the source distribution and put it wherever you want.  Then just 
update the settings to the location you used.


good luck,


Andy

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [GENERAL] Tsearch vector not stored by update/set

Reply via email to