Re: Re: Perl Regex
Does this have to be hard coded in to the script? Just wondering since I have been kinda following this thread.Feb 26, 2010 01:56:02 PM, perl-win32-users-boun...@listserv.activestate.com wrote: It looks like what u want to do is attribute folding. That's when u take anested XML tag and make it an attribute of an enclosing tag. Ur doingsomething slightly different which is merging equal depth tags. The rightway to do this is with an XML parser. Look into XML::Simple to get started.U would read in the XML to a hash, manipulate the data in the hash, and thenwrite out a new XML file.Regex can do this in a degenerate case but it becomes unmanageable fast.But since u asked$xml =~s{(\s*)([^<]*)\s*([^<]*)eId>(\s*)}{$1pages="$3">$2$4}sg;HTHAt 09:25 PM 2/26/2010 +0530, Kprasad wrote:>Hi All>>What will be the perfect Regular _expression_ to convert below mentioned'Search Text' to 'Replacement Text' while 'Single Line' option is ON.>>When I use below mentioned Regex>]+)?>((?!<\/index-entry>).*?)\s*([0-9]+)>>And replaces wrongly>>arousal disordershref="" label="see">disorders of arousal>>.>>Search Text:>>>APOE e4 variant 18>>>arousal disorders label="see">disorders of arousal>>>arterial blood gas tests 32>>>asthma 28--9, 295>>>Correct Replacement Text should be:>>>APOE e4 variant>>>arousal disorders label="see">disorders of arousal>>>arterial blood gas tests>>>asthma>>--REMEMBER THE WORLD TRADE CENTER ---=< WTC 911 >=--"...ne cede malis"0100___Perl-Win32-Users mailing listPerl-Win32-Users@listserv.ActiveState.comTo unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Perl Regex
Yes, you could use an XML parser to do the job described below but this case is pretty simple. Here's my offering leaving out the reading/writing of the files. -- my $s = <<"EOF"; APOE e4 variant 18 arousal disorders disorders of arousal arterial blood gas tests 32 asthma 28--9, 295 EOF $s =~ s{(.*?)\s*(.*?)} {$1}g; print $s; -- You could replace the two .*? with [^>]* if you wanted to be more precise but it looks more confusing. Jon == original query Hi All What will be the perfect Regular Expression to convert below mentioned 'Search Text' to 'Replacement Text' while 'Single Line' option is ON. When I use below mentioned Regex ]+)?>((?!<\/index-entry>).*?) \s*([0-9]+) And replaces wrongly arousal disordersdisorders of arousal . Search Text: APOE e4 variant 18 arousal disorders disorders of arousal arterial blood gas tests 32 asthma 28--9, 295 Correct Replacement Text should be: APOE e4 variant arousal disorders disorders of arousal arterial blood gas tests asthma Kanhaiya ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Perl Regex
Here is the chunk of code which I used to perform this task: open(XML, "<$ARGV[0]") or die "Can not open $ARGV[0]: $!"; my $xmltext; { local $/ = undef; $xmltext=; } close(XML); while($xmltext=~ /]+)?>(?:.*?)<\/index-entry>(?:[^\n]*?)([^<]+)<\/pageId>/is) { $page=$2; $page=~ s/ *\n+\t+/ /g; $page=~ s/, /,/g; $xmltext=~ s|]+)?>(.*?)(?:[^\n]*?)[^<]+|$2|s } $xmltext=~ s/$localpath/$xmlfile\_final.xml") or die "Can not open $localpath/$xmlfile\_final.xml: $!"; print XMLOUT $xmltext; close(XMLOUT); Thanks Kanhaiya - Original Message - From: "Brian Raven" To: Sent: Friday, February 26, 2010 10:22 PM Subject: RE: Perl Regex > > From: perl-win32-users-boun...@listserv.activestate.com > [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of > Kprasad > Sent: 26 February 2010 15:56 > To: perl-win32-users@listserv.ActiveState.com > Subject: Perl Regex > >> Hi All >> >> What will be the perfect Regular Expression to convert below mentioned > 'Search Text' to 'Replacement >> Text' while 'Single Line' option is ON. >> >> When I use below mentioned Regex >> > ]+)?>((?!<\/index-entry>).*?)\s* > ([0-9]+) >> >> And replaces wrongly > > I think it is going to be hard to be of much help. Mostly because you > don't show us any Perl. > > First, a regular expression can't change anything, it can only match. > > Second, I find it easier to work out what is going on with non-trivial > regular expressions if I use the 'x' switch, which allows me to break > the RE over multiple lines, and include comments. Particularly useful > with the 'qr' quoting operator. Your RE, for example, might look like > this. > > my $re=qr{]+)?> > ((?!<\/index-entry>).*?) > > \s* > > ([0-9]+) > > }x; > > However, as you don't provide any information on how that RE is used, > its going to be difficult to say what might be going wrong. If you could > provide a small example script, that we could cut & paste & run, it > would make it much easier. > > Finally, your data looks a lot like XML. A dedicated parser will > generally do a more reliable job of parsing XML that regular > expressions, even Perl regular expressions. > > HTH > > -- > Brian Raven > > Please consider the environment before printing this email. > > This e-mail may contain confidential and/or privileged information. If you > are not the intended recipient or have received this e-mail in error, > please advise the sender immediately by reply e-mail and delete this > message and any attachments without retaining a copy. > > Any unauthorised copying, disclosure or distribution of the material in > this e-mail is strictly forbidden. > > ___ > Perl-Win32-Users mailing list > Perl-Win32-Users@listserv.ActiveState.com > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs > > ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Perl Regex
It looks like what u want to do is attribute folding. That's when u take a nested XML tag and make it an attribute of an enclosing tag. Ur doing something slightly different which is merging equal depth tags. The right way to do this is with an XML parser. Look into XML::Simple to get started. U would read in the XML to a hash, manipulate the data in the hash, and then write out a new XML file. Regex can do this in a degenerate case but it becomes unmanageable fast. But since u asked $xml =~ s{(\s*)([^<]*)\s*([^<]*)(\s*)}{$1$2$4}sg; HTH At 09:25 PM 2/26/2010 +0530, Kprasad wrote: >Hi All > >What will be the perfect Regular Expression to convert below mentioned 'Search Text' to 'Replacement Text' while 'Single Line' option is ON. > >When I use below mentioned Regex >]+)?>((?!<\/index-entry>).*?)\s*([0 -9]+) > >And replaces wrongly > >arousal disordersdisorders of arousal > >. > >Search Text: > > >APOE e4 variant 18 > > >arousal disorders disorders of arousal > > >arterial blood gas tests 32 > > >asthma 28--9, 295 > > >Correct Replacement Text should be: > > >APOE e4 variant > > >arousal disorders disorders of arousal > > >arterial blood gas tests > > >asthma > > -- REMEMBER THE WORLD TRADE CENTER ---=< WTC 911 >=-- "...ne cede malis" 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Perl Regex
From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Kprasad Sent: 26 February 2010 15:56 To: perl-win32-users@listserv.ActiveState.com Subject: Perl Regex > Hi All > > What will be the perfect Regular Expression to convert below mentioned 'Search Text' to 'Replacement > Text' while 'Single Line' option is ON. > > When I use below mentioned Regex > ]+)?>((?!<\/index-entry>).*?)\s* ([0-9]+) > > And replaces wrongly I think it is going to be hard to be of much help. Mostly because you don't show us any Perl. First, a regular expression can't change anything, it can only match. Second, I find it easier to work out what is going on with non-trivial regular expressions if I use the 'x' switch, which allows me to break the RE over multiple lines, and include comments. Particularly useful with the 'qr' quoting operator. Your RE, for example, might look like this. my $re=qr{]+)?> ((?!<\/index-entry>).*?) \s* ([0-9]+) }x; However, as you don't provide any information on how that RE is used, its going to be difficult to say what might be going wrong. If you could provide a small example script, that we could cut & paste & run, it would make it much easier. Finally, your data looks a lot like XML. A dedicated parser will generally do a more reliable job of parsing XML that regular expressions, even Perl regular expressions. HTH -- Brian Raven Please consider the environment before printing this email. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs