subject:"Re\: Perl Regex"

Re: Re: Perl Regex

2010-03-26 Thread Mark Bergeron


Does this have to be hard coded in to the script? Just wondering since I have been kinda following this thread.Feb 26, 2010 01:56:02 PM, perl-win32-users-boun...@listserv.activestate.com wrote:
It looks like what u want to do is attribute folding. That's when u take anested XML tag and make it an attribute of an enclosing tag. Ur doingsomething slightly different which is merging equal depth tags. The rightway to do this is with an XML parser. Look into XML::Simple to get started.U would read in the XML to a hash, manipulate the data in the hash, and thenwrite out a new XML file.Regex can do this in a degenerate case but it becomes unmanageable fast.But since u asked$xml =~s{(\s*)([^<]*)\s*([^<]*)eId>(\s*)}{$1pages="$3">$2$4}sg;HTHAt 09:25 PM 2/26/2010 +0530, Kprasad wrote:>Hi All>>What will be the perfect Regular _expression_ to convert below mentioned'Search Text' to 'Replacement Text' while 'Single Line' option is ON.>>When I use below mentioned Regex>]+)?>((?!<\/index-entry>).*?)\s*([0-9]+)>>And replaces wrongly>>arousal disordershref="" label="see">disorders of arousal>>.>>Search Text:>>>APOE e4 variant 18>>>arousal disorders label="see">disorders of arousal>>>arterial blood gas tests 32>>>asthma 28--9, 295>>>Correct Replacement Text should be:>>>APOE e4 variant>>>arousal disorders label="see">disorders of arousal>>>arterial blood gas tests>>>asthma>>--REMEMBER THE WORLD TRADE CENTER ---=< WTC 911 >=--"...ne cede malis"0100___Perl-Win32-Users mailing listPerl-Win32-Users@listserv.ActiveState.comTo unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Perl Regex

2010-02-28 Thread Jon Bjornstad

Yes, you could use an XML parser to do the job described below
but this case is pretty simple.   Here's my offering
leaving out the reading/writing of the files.

--
my $s = <<"EOF";

APOE e4 variant 18


arousal disorders disorders of arousal


arterial blood gas tests 32


asthma 28--9, 295

EOF

$s =~ s{(.*?)\s*(.*?)}
{$1}g;

print $s;
--

You could replace the two .*? with [^>]* if you wanted to be more  
precise
but it looks more confusing.

Jon

==  original query 

Hi All

What will be the perfect Regular Expression to convert below mentioned  
'Search Text' to 'Replacement Text' while 'Single Line' option is ON.

When I use below mentioned Regex
]+)?>((?!<\/index-entry>).*?) 
\s*([0-9]+)

And replaces wrongly

arousal disordersdisorders of arousal

.

Search Text:


APOE e4 variant 18


arousal disorders disorders of arousal


arterial blood gas tests 32


asthma 28--9, 295


Correct Replacement Text should be:


APOE e4 variant


arousal disorders disorders of arousal


arterial blood gas tests


asthma


Kanhaiya

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Perl Regex

2010-02-26 Thread Kprasad

Here is the chunk of code which I used to perform this task:

 open(XML, "<$ARGV[0]") or die "Can not open $ARGV[0]: $!";
 my $xmltext;
 {
  local $/ = undef;
  $xmltext=;
 }
 close(XML);
 while($xmltext=~ 
/]+)?>(?:.*?)<\/index-entry>(?:[^\n]*?)([^<]+)<\/pageId>/is)
 {
  $page=$2;
  $page=~ s/ *\n+\t+/ /g;
  $page=~ s/, /,/g;
  $xmltext=~ 
s|]+)?>(.*?)(?:[^\n]*?)[^<]+|$2|s
 }
 $xmltext=~ s/$localpath/$xmlfile\_final.xml") or die "Can not open 
$localpath/$xmlfile\_final.xml: $!";
 print XMLOUT $xmltext;
 close(XMLOUT);

Thanks
Kanhaiya

- Original Message - 
From: "Brian Raven" 
To: 
Sent: Friday, February 26, 2010 10:22 PM
Subject: RE: Perl Regex


>
> From: perl-win32-users-boun...@listserv.activestate.com
> [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of
> Kprasad
> Sent: 26 February 2010 15:56
> To: perl-win32-users@listserv.ActiveState.com
> Subject: Perl Regex
>
>> Hi All
>>
>> What will be the perfect Regular Expression to convert below mentioned
> 'Search Text' to 'Replacement
>> Text' while 'Single Line' option is ON.
>>
>> When I use below mentioned Regex
>>
> ]+)?>((?!<\/index-entry>).*?)\s*
> ([0-9]+)
>>
>> And replaces wrongly
>
> I think it is going to be hard to be of much help. Mostly because you
> don't show us any Perl.
>
> First, a regular expression can't change anything, it can only match.
>
> Second, I find it easier to work out what is going on with non-trivial
> regular expressions if I use the 'x' switch, which allows me to break
> the RE over multiple lines, and include comments. Particularly useful
> with the 'qr' quoting operator. Your RE, for example, might look like
> this.
>
> my $re=qr{]+)?>
>   ((?!<\/index-entry>).*?)
>   
>   \s*
>   
>   ([0-9]+)
>   
>  }x;
>
> However, as you don't provide any information on how that RE is used,
> its going to be difficult to say what might be going wrong. If you could
> provide a small example script, that we could cut & paste & run, it
> would make it much easier.
>
> Finally, your data looks a lot like XML. A dedicated parser will
> generally do a more reliable job of parsing XML that regular
> expressions, even Perl regular expressions.
>
> HTH
>
> -- 
> Brian Raven
>
> Please consider the environment before printing this email.
>
> This e-mail may contain confidential and/or privileged information. If you 
> are not the intended recipient or have received this e-mail in error, 
> please advise the sender immediately by reply e-mail and delete this 
> message and any attachments without retaining a copy.
>
> Any unauthorised copying, disclosure or distribution of the material in 
> this e-mail is strictly forbidden.
>
> ___
> Perl-Win32-Users mailing list
> Perl-Win32-Users@listserv.ActiveState.com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>
> 

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Perl Regex

2010-02-26 Thread Chris Wagner

It looks like what u want to do is attribute folding.  That's when u take a
nested XML tag and make it an attribute of an enclosing tag.  Ur doing
something slightly different which is merging equal depth tags.  The right
way to do this is with an XML parser.  Look into XML::Simple to get started.
U would read in the XML to a hash, manipulate the data in the hash, and then
write out a new XML file.

Regex can do this in a degenerate case but it becomes unmanageable fast.
But since u asked

$xml =~
s{(\s*)([^<]*)\s*([^<]*)(\s*)}{$1$2$4}sg;

HTH

At 09:25 PM 2/26/2010 +0530, Kprasad wrote:
>Hi All
>
>What will be the perfect Regular Expression to convert below mentioned
'Search Text' to 'Replacement Text' while 'Single Line' option is ON.
>
>When I use below mentioned Regex
>]+)?>((?!<\/index-entry>).*?)\s*([0
-9]+)
>
>And replaces wrongly
>
>arousal disordersdisorders of arousal
>
>.
>
>Search Text:
>
>
>APOE e4 variant 18
>
>
>arousal disorders disorders of arousal
>
>
>arterial blood gas tests 32
>
>
>asthma 28--9, 295
>
>
>Correct Replacement Text should be:
>
>
>APOE e4 variant
>
>
>arousal disorders disorders of arousal
>
>
>arterial blood gas tests
>
>
>asthma
>
>

--
REMEMBER THE WORLD TRADE CENTER ---=< WTC 911 >=--
"...ne cede malis"

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: Perl Regex

2010-02-26 Thread Brian Raven

From: perl-win32-users-boun...@listserv.activestate.com
[mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of
Kprasad
Sent: 26 February 2010 15:56
To: perl-win32-users@listserv.ActiveState.com
Subject: Perl Regex

> Hi All
>  
> What will be the perfect Regular Expression to convert below mentioned
'Search Text' to 'Replacement 
> Text' while 'Single Line' option is ON.
>  
> When I use below mentioned Regex
>
]+)?>((?!<\/index-entry>).*?)\s*
([0-9]+)
>  
> And replaces wrongly

I think it is going to be hard to be of much help. Mostly because you
don't show us any Perl.

First, a regular expression can't change anything, it can only match.

Second, I find it easier to work out what is going on with non-trivial
regular expressions if I use the 'x' switch, which allows me to break
the RE over multiple lines, and include comments. Particularly useful
with the 'qr' quoting operator. Your RE, for example, might look like
this.

my $re=qr{]+)?>
  ((?!<\/index-entry>).*?)

  \s*

  ([0-9]+)

  }x;

However, as you don't provide any information on how that RE is used,
its going to be difficult to say what might be going wrong. If you could
provide a small example script, that we could cut & paste & run, it
would make it much easier.

Finally, your data looks a lot like XML. A dedicated parser will
generally do a more reliable job of parsing XML that regular
expressions, even Perl regular expressions.

HTH

-- 
Brian Raven 

Please consider the environment before printing this email.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient or have received this e-mail in error, please advise 
the sender immediately by reply e-mail and delete this message and any 
attachments without retaining a copy.

Any unauthorised copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Re: Perl Regex

Re: Perl Regex

Re: Perl Regex

Re: Perl Regex

RE: Perl Regex

5 matches

Site Navigation

Mail list logo

Footer information