Re: There has to be a way to do this
On Jun 23, [EMAIL PROTECTED] said: >SRED. SREDNE >SEV. SEVERN ># Match it at beginning of line >$cgname =~ s/^SRED\.(?=[\W\s\-\d]+)/SREDNE:/g ; Three things -- the + modifier on the [...] isn't needed, you don't need to put \s and - in a character class you've already put \W in, and the /g modifier is totally worthless here... there's only ONE beginning of the line! $cgname =~ s/^SRED\.(?=[\W\d])/SREDNE:/; ># Match it within the line >$cgname =~ s/[\W\s\-]+SRED\.(?=[\W\s\-\d]+)/:SREDNE:/g ; I have a feeling you want to use \b instead of [\W\s-]. It's cleaner and doesn't actually absorb a character. $cgname =~ s/\bSRED\.(?=[\W\d])/:SREDNE:/g; ># Match it at end of line >$cgname =~ s/[\W\s\-]+SRED\.$/:SREDNE:/g ; Again, use \b, but there's no need for /g here. $cgname =~ s/\bSRED\.$/:SREDNE:/; ># Match if it begins & ends line >$cgname =~ s/^SRED\.$/:SREDNE:/g ; Ah, here's an interesting case. This is actually already handled by my modifications. The problem is that you were using /[\W\s\-]+SRED\.$/ but if the string is "SRED.", then [\W\s\-] can't match anything. So that's why using a word boundary (\b) is smarter. Also, we can change the look-aheads to go from positive to negative. Instead of saying "and I am followed by a non-letter", why not say "and I am NOT followed by a letter"? $cgname =~ s/^SRED\.(?![A-Za-z])/SREDNE:/; # front $cgname =~ s/\bSRED\.(?![A-Za-z])/:SREDNE:/g; # middle $cgname =~ s/\bSRED\.$/:SREDNE:/; # end If you're worried about hardcoding the letter set (A-Za-z), then you can use this character class instead: [^\W\d_]. It means "match anything that's not: a non-word character, a digit, or an underscore". It's a sneaky way of matching anything that would be matched by \w WITHOUT matching \d or _. $cgname =~ s/^SRED\.(?![^\W\d_])/SREDNE:/; # front $cgname =~ s/\bSRED\.(?![^\W\d_])/:SREDNE:/g; # middle $cgname =~ s/\bSRED\.$/:SREDNE:/; # end >Right now I'm generating the regexes in a standalone script, then inserting >the output code into the subroutine that processes names into a "matchable" >form. > >What I'd like to be able to do is take a *set* of abbreviation >"dictionaries," concatenate them together and dynamically generate the >regex code in the routine that is going to execute it. So you want to take the dictionary files, and use them to create a function that does all the regexes on its input? -- Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/ what does y/// stand for? why, yansliterate of course. [ I'm looking for programming work. If you like my work, let me know. ] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: There has to be a way to do this
I don't have code to do what I want, but here's the pieces I'm trying to string together: Abbreviation dictionary consists of a file like this: SRED. SREDNE SEV. SEVERN etc. Each abbreviation is turned into four regexes, like this (doubtless they could be made more efficient, but they work well enough at present): # Sred. = SREDNE $cgname =~ s/^SRED\.(?=[\W\s\-\d]+)/SREDNE:/g ; # Match it at beginning of line $cgname =~ s/[\W\s\-]+SRED\.(?=[\W\s\-\d]+)/:SREDNE:/g ; # Match it within the line $cgname =~ s/[\W\s\-]+SRED\.$/:SREDNE:/g ; # Match it at end of line $cgname =~ s/^SRED\.$/:SREDNE:/g ; # Match if it begins & ends line # Sev. = SEVERN $cgname =~ s/^SEV\.(?=[\W\s\-\d]+)/SEVERN:/g ;# Match it at beginning of line $cgname =~ s/[\W\s\-]+SEV\.(?=[\W\s\-\d]+)/:SEVERN:/g ; # Match it within the line $cgname =~ s/[\W\s\-]+SEV\.$/:SEVERN:/g ; # Match it at end of line $cgname =~ s/^SEV\.$/:SEVERN:/g ; # Match if it begins & ends line etc. Right now I'm generating the regexes in a standalone script, then inserting the output code into the subroutine that processes names into a "matchable" form. What I'd like to be able to do is take a *set* of abbreviation "dictionaries," concatenate them together and dynamically generate the regex code in the routine that is going to execute it. Thanks, Scott Scott E. Robinson SWAT Team UTC Onsite User Support RR-690 -- 281-654-5169 EMB-2813N -- 713-656-3629 "David Kirol" <[EMAIL PROTECTED]To: <[EMAIL PROTECTED]> > cc: Subject: Re: There has to be a way to do this 06/20/03 08:38 PM Scott, Sounds like a fun problem. Can you post some code and an (abbreviated) set of example data? David "Scott E Robinson" <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>... > I'm still working on the well-name matching program that I've brought up > here before. I've received invaluable help to solve the toughest questions > in its development, for which I'm very grateful. > > Now I'm trying to automate some steps which were previously manual in the > process, to make it more end-user-friendly. There has to be a way to do > this with Perl. > > The script uses a "dictionary" of abbreviations to aid its matching. The > abbreviations are implemented as a series of substitutions with the "s" > operator. I have a Perl script which builds the substitution statements > from a tab-delimited list of abbreviations and their equivalent long forms. > I then manually insert these statements into the subroutine that uses them. > > I kept the abbreviation translation hardcoded into the subroutine for > performance reasons (this thing compares 14,000 unknown well names against > 680,000 match candidates). Is there a way in Perl to read the abbreviation > dicitionary (the tab-delimited list), generate the code, insert it into the > right subroutine, and start executing the program, all in one script? > (Maybe you can tell me that the performance hit from using variables in the > substitution statements is negligible, and if so, I'd be happy to go that > route.) > > Thanks in advance, > > Scott > > Scott E. Robinson > Data SWAT Team > UTC Onsite User Support > RR-690 -- 281-654-5169 > EMB-2813N -- 713-656-3629 > -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: There has to be a way to do this
In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] (Scott E Robinson) writes: >Is there a way in Perl to read the abbreviation >dicitionary (the tab-delimited list), generate the code, insert it into the >right subroutine, and start executing the program, all in one script? perldoc -f eval Also there is a good discussion on dynamically generating regex matching code in "Effective Perl Programming" by Joseph Hall (Addison-Wesley). Doubtless there are free on-line equivalents but references escape me for the moment. -- Peter Scott http://www.perldebugged.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
There has to be a way to do this
I'm still working on the well-name matching program that I've brought up here before. I've received invaluable help to solve the toughest questions in its development, for which I'm very grateful. Now I'm trying to automate some steps which were previously manual in the process, to make it more end-user-friendly. There has to be a way to do this with Perl. The script uses a "dictionary" of abbreviations to aid its matching. The abbreviations are implemented as a series of substitutions with the "s" operator. I have a Perl script which builds the substitution statements from a tab-delimited list of abbreviations and their equivalent long forms. I then manually insert these statements into the subroutine that uses them. I kept the abbreviation translation hardcoded into the subroutine for performance reasons (this thing compares 14,000 unknown well names against 680,000 match candidates). Is there a way in Perl to read the abbreviation dicitionary (the tab-delimited list), generate the code, insert it into the right subroutine, and start executing the program, all in one script? (Maybe you can tell me that the performance hit from using variables in the substitution statements is negligible, and if so, I'd be happy to go that route.) Thanks in advance, Scott Scott E. Robinson Data SWAT Team UTC Onsite User Support RR-690 -- 281-654-5169 EMB-2813N -- 713-656-3629 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]