Re: regex, 1 off...
Seems it's related to a more general question stated as `Given 2 sequences, find longest common sub sequence'. Many algorithm books have materials about this one. -Todd -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex, 1 off...
However much depends on the actual data and the variations that you are expecting. If you are searching for words like those used in the English language then you may want to look at how spell checking software works. Seems related to the algorithm like `find the longest common sub sequence of give 2 sequences'. Many algorithm books covers it. -Todd -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex, 1 off...
On Dec 16, 2007 2:21 PM, namotco [EMAIL PROTECTED] wrote: Let's say I want to search some text for abc123. However, we know people can make typos and so they could have entered avc123 or abc223 or sbc123 or bc123 many other combinations... So I want to search for those possibilities as well. So how would I go about creating the proper regex? Thanks! How do you define a typo? How do you know whether it's a typo, or a different string? Do you know, for instance, that only 'abc\d\d\d' is valid, and 'avc\d\d\d' is never valid? If so, you could do something like: if (/^abc\d\d\d$/ or s/^a.c(\d\d\d)$/abc$1/) { # match! } else { #no match! } If you can't predict the input, though, you'll need some heavy duty algorithmic logic. Take a look through CPAN and see if there isn't something that meets your needs. String::Approx and String::KeyboardDistance might be places to start. There are also a number of things in the Text::* tree. HTH, -- jay -- This email and attachment(s): [ ] blogable; [ x ] ask first; [ ] private and confidential daggerquill [at] gmail [dot] com http://www.tuaw.com http://www.downloadsquad.com http://www.engatiki.org values of β will give rise to dom!
Re: regex, 1 off...
namotco wrote: Let's say I want to search some text for abc123. However, we know people can make typos and so they could have entered avc123 or abc223 or sbc123 or bc123 many other combinations... So I want to search for those possibilities as well. So how would I go about creating the proper regex? I don't think a regex is appropriate in this case, but if you want to write something that guesses at what a misspelled string should have been then search the Web for Damerau-Levenshtein distance, which is very effective and the algorithm codes up fairly simply. Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
regex, 1 off...
Let's say I want to search some text for abc123. However, we know people can make typos and so they could have entered avc123 or abc223 or sbc123 or bc123 many other combinations... So I want to search for those possibilities as well. So how would I go about creating the proper regex? Thanks! -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex, 1 off...
On Sunday 16 December 2007 11:21, namotco wrote: Let's say I want to search some text for abc123. However, we know people can make typos and so they could have entered avc123 or abc223 or sbc123 or bc123 many other combinations... So I want to search for those possibilities as well. So how would I go about creating the proper regex? Regular expressions are about matching patterns so you have to define what kind of pattern you are searching for. From your example you may want something like: / \b (?: .?bc | a.?c | ab.? ) (?: .23 | 1.3 | 12. ) \b /x However much depends on the actual data and the variations that you are expecting. If you are searching for words like those used in the English language then you may want to look at how spell checking software works. John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/