great - I guess if it shifts away from "fixed" soundex - probably should try and find out who is using it to ensure there are no surprises. I can't imagine it is widely used.
On Mon, Oct 11, 2010 at 2:43 PM, Wolfgang Laun <[email protected]>wrote: > On 10 October 2010 23:41, Michael Neale <[email protected]> wrote: > > I think you should clean room implement it (or reuse some old code of > yours > > if it is safe to do so). From what I have seen of the algorithm - it > isn't > > huge - and it would make sense to have it re-implemented. As an > alternative > > - consider taking a look at the MVEL soundex code and rewriting that - > and > > we will see if we can make it upstream. > > I just re-implemented this according to the algorithm I found in > http://en.wikipedia.org/wiki/Soundex > I've also consulted a CPAN module, to learn what was intended by the > MVEL implementation, but it's undecidable (possibly due to omissions or > bugs). > > > > I would say it is just slightly > > neglected - its not well known that it lives there. Using the MVEL one > was > > just opportunistic for drools. > > I didn't know that it could return null, that is bad. I guess if it is > null > > - that would mean that you just do a literal case insensitive compare? > > A correct implementation never returns null. An empty word might, but for > our purpose "" would be preferable. > > > > Also - AFAIK - soundex is only for english right? > Certainly. > > > > Is there an equivalent for other languages? > Soundex is coarse even for English. I've found the atrocious example that > the Soundex for "Britney Spears" is the same as for > "bewährten Superzicke" (~ "proven super-b*"). > NYSIIS<http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System>is > supposed > to be better. > > For German, there is an equivalent: "Kölner Phonetik". It might > make sense to provide this for an operator "soundex[de]". (All of > /M[ae][iy]e?r/ sound alike in German, and all exist as proper names.) > > I have also found one link to an implementation adapted for French. > > Soundex is aimed at the pronunciation of proper names. There might be some > leeway for that even in a language like Hungarian, which is pronounced > exactly > as written. > > I think Drools should drop the MVEL version and go for a flexible approach, > possibly even s.th. better than Soundex/NARA for English. I'll research > this > some more, and report back before I commit anything ;-) > > -W > > > > > If so, perhaps having it in the drools codebase makes sense > > and opens the way for people to plug in their own soundex. > > On Mon, Oct 11, 2010 at 2:54 AM, Wolfgang Laun <[email protected]> > > wrote: > >> > >> The implementation of "soundslilke" is broken in more than one respect. > >> The conversion of a word to a Soundex string is provided by > >> org.mvel2.util.Soundex. > >> (.) There are words where Soundex.soundex returns null, so that the > >> calling code, in Drools, crashes with a NPE. > >> (.) The algorithm implemented in Soundex is erroneous. I'm not sure > which > >> Soundex algorithm it is supposed to implement, but it just doesn't meet > the > >> basic requirements. > >> > >> I have implemented, correctly, the version for the National Archives and > >> Records Administration (NARA) rule set for the official implementation > of > >> Soundex used by the U.S. Government. > >> > >> Do we wait for MVEL to correct this bug, or do we just replace it with a > >> correct implementation? > >> > >> Regards > >> Wolfgang > > > _______________________________________________ > rules-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/rules-dev > > -- Michael D Neale home: www.michaelneale.net blog: michaelneale.blogspot.com
_______________________________________________ rules-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/rules-dev
