On 18.05.09 13:40, Raphael Ritz wrote: > Hi folks, > > sorry if this is considered inappropriate but I'd appreciate input > from those who know more than I on searching technologies: > > In short I need to support full text searches with > > 1. plural versus singular forms treated equally > 2. American versus British English treated equally > 3. (Obvious?) spelling errors corrected/taken care of. > > Now from what I know this translates to > > 1. -> stemming support Honestly spoken: all stemmers based on "porter stemmer" suck in many ways. I don't know of any open-source stemmer software (not based on the porter stemmer) that would fulfill professional requirements. The only available usable solutions are commercial and expensive
> 2. -> appropriate normalization? thesaurus based search? > (if so, what would be appropriate normalizers or > thesauri?) In some way supported in TXNG3 (and extensible). Both normalization (including aspects like handling of compounds has potential for improvements) and thesaurus search are features not widely used and in some way underdeveloped in TXNG3 > 3. -> similarity search where similarity is defined > according to some algorithm (e.g., Levenstein) Similarity search can be implemented in different way. Levenhstein is perhaps the best approach for doing algorithmic similarity search. Professional search systems take a thesaurus into account. > > First question: did I get the vocabulary right here? > > Second question: looking around I (obviously) consider > TextIndexNG but one thing I found there is that stemming > support is incompatible with globbing (wildcard) support. Isn't it? The various combinations of TXNG options will likely lead to confusion and unpredicatable result depending on the settings. > While it seems obvious to me that they are kind of > mutually exclusive I'd like to know how others are dealing > with this (have two differently configured text indexes for > the full text search and query one or the other???). Some features are mutally exclusive (perhaps TXNG does not try to catch stupid parameter combinations). > > Last but not least I'd appreciate pointers to docs teaching > the general concepts, constraints, and vocabularies so that > I know what I'm talking about in the future. You might look at alternatives based on Lucene (e.g. SOLR, collective.solr by Florian). The last time I checked the Lucene world for related features as you need it, I came to the conclusion that those solutions have the same problem when it comes to professional stemming, thesauri support...those solutions suck on the same high level as TextIndexNG3. Andreas
begin:vcard fn:Andreas Jung n:Jung;Andreas org:ZOPYX Ltd. & Co. KG adr;quoted-printable:;;Charlottenstr. 37/1;T=C3=BCbingen;;72070;Germany email;internet:[email protected] title:CEO tel;work:+49-7071-793376 tel;fax:+49-7071-7936840 tel;home:+49-7071-793257 x-mozilla-html:FALSE url:www.zopyx.com version:2.1 end:vcard
_______________________________________________ Product-Developers mailing list [email protected] http://lists.plone.org/mailman/listinfo/product-developers
