In a Solr-based search, stemming is done at indexing time, into fields with 
stemmed tokens.

It seems typical in library-catalog type applications based on Solr to have the 
default (or even only) searches be over these stemmed fields, thus 
'auto-stemming' to the user. (Search for 'monkey', find 'monkeys' too, and vice 
versa).

I am curious how many people, who have Solr based catalogs (that is, I'm 
interested in people who have search engines with majority or only content 
originally from MARC), use such stemmed fields ('auto-stemming') over their 
_author_ fields as well.

There are pro's and con's to this. There are certainly some things in an author 
field that would benefit from stemming (mostly various kinds of corporate 
authors, some of whose endings end up looking like english language phrases). 
There are also very many things in an author field that would not benefit from 
stemming, and thus when stemming is done it sometimes(/often?) results in false 
matches, "pluralizing" an author's last name in an inappropriate way for 
instance.

So, wanna say on the list, if you are using a Solr-based catalog, are you using 
stemmed fields for your author searches? Curious what people end up doing.  If 
there are any other more complicated clever things you've done than just 
stem-or-not, let us know that too!

Jonathan

Reply via email to