2009/10/16 <scrlhead...@mts.net>: > We are using version 1.4.0.4. If I do a key word search for Ted Arnold I do > not pull up any records. The correct spelling is Tedd Arnold which pulls up > 15 records. My understanding is that Evergreen uses “stemming” so doesn’t > that mean it shouldn’t matter if I use Ted or Tedd?? Thanks in advance. > > Mary Toma
Hi Mary: Evergreen's full-text search uses the Porter stemming algorithm, defined at http://snowball.tartarus.org/algorithms/porter/stemmer.html The long story short, words that end in double consonants do not automatically get stemmed to a single consonant - if there is a double consonant followed by a suffix that is recognized as causing doubling of a final consonant (for example, '-ed' or '-ing'), then the word would get stemmed down to a single consonant at the end. You can see what's happening under the covers by connecting to your Evergreen database and directly entering some terms into the index tables: evergreen=# insert INTO metabib.keyword_field_entry (source, value, field) values (1, 'tedd', 15); INSERT 0 1 evergreen=# select * from metabib.keyword_field_entry where value = 'tedd'; id | source | field | value | index_vector ----+--------+-------+-------+-------------- 4 | 1 | 15 | tedd | 'tedd':1 (1 row) Here, the 'index_vector' field shows you what full-text search will match against ('tedd'). Compare that to the contrived example of inserting 'tedding': evergreen=# insert INTO metabib.keyword_field_entry (source, value, field) values (1, 'tedding', 15); INSERT 0 1 evergreen=# select * from metabib.keyword_field_entry where value = 'tedding'; id | source | field | value | index_vector ----+--------+-------+---------+-------------- 5 | 1 | 15 | tedding | 'ted':1 (1 row) With 'tedding', the short-vowel double-consonant followed by a recognized consonant-doubling suffix results in the stem of 'ted' being indexed. Hopefully this helps...