Re: Perl Regex

2010-02-26 Thread Kprasad
Here is the chunk of code which I used to perform this task:

 open(XML, "<$ARGV[0]") or die "Can not open $ARGV[0]: $!";
 my $xmltext;
 {
  local $/ = undef;
  $xmltext=;
 }
 close(XML);
 while($xmltext=~ 
/]+)?>(?:.*?)<\/index-entry>(?:[^\n]*?)([^<]+)<\/pageId>/is)
 {
  $page=$2;
  $page=~ s/ *\n+\t+/ /g;
  $page=~ s/, /,/g;
  $xmltext=~ 
s|]+)?>(.*?)(?:[^\n]*?)[^<]+|$2|s
 }
 $xmltext=~ s/$localpath/$xmlfile\_final.xml") or die "Can not open 
$localpath/$xmlfile\_final.xml: $!";
 print XMLOUT $xmltext;
 close(XMLOUT);

Thanks
Kanhaiya

- Original Message - 
From: "Brian Raven" 
To: 
Sent: Friday, February 26, 2010 10:22 PM
Subject: RE: Perl Regex


>
> From: perl-win32-users-boun...@listserv.activestate.com
> [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of
> Kprasad
> Sent: 26 February 2010 15:56
> To: perl-win32-users@listserv.ActiveState.com
> Subject: Perl Regex
>
>> Hi All
>>
>> What will be the perfect Regular Expression to convert below mentioned
> 'Search Text' to 'Replacement
>> Text' while 'Single Line' option is ON.
>>
>> When I use below mentioned Regex
>>
> ]+)?>((?!<\/index-entry>).*?)\s*
> ([0-9]+)
>>
>> And replaces wrongly
>
> I think it is going to be hard to be of much help. Mostly because you
> don't show us any Perl.
>
> First, a regular expression can't change anything, it can only match.
>
> Second, I find it easier to work out what is going on with non-trivial
> regular expressions if I use the 'x' switch, which allows me to break
> the RE over multiple lines, and include comments. Particularly useful
> with the 'qr' quoting operator. Your RE, for example, might look like
> this.
>
> my $re=qr{]+)?>
>   ((?!<\/index-entry>).*?)
>   
>   \s*
>   
>   ([0-9]+)
>   
>  }x;
>
> However, as you don't provide any information on how that RE is used,
> its going to be difficult to say what might be going wrong. If you could
> provide a small example script, that we could cut & paste & run, it
> would make it much easier.
>
> Finally, your data looks a lot like XML. A dedicated parser will
> generally do a more reliable job of parsing XML that regular
> expressions, even Perl regular expressions.
>
> HTH
>
> -- 
> Brian Raven
>
> Please consider the environment before printing this email.
>
> This e-mail may contain confidential and/or privileged information. If you 
> are not the intended recipient or have received this e-mail in error, 
> please advise the sender immediately by reply e-mail and delete this 
> message and any attachments without retaining a copy.
>
> Any unauthorised copying, disclosure or distribution of the material in 
> this e-mail is strictly forbidden.
>
> ___
> Perl-Win32-Users mailing list
> Perl-Win32-Users@listserv.ActiveState.com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>
> 

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Perl Regex

2010-02-26 Thread Chris Wagner
It looks like what u want to do is attribute folding.  That's when u take a
nested XML tag and make it an attribute of an enclosing tag.  Ur doing
something slightly different which is merging equal depth tags.  The right
way to do this is with an XML parser.  Look into XML::Simple to get started.
U would read in the XML to a hash, manipulate the data in the hash, and then
write out a new XML file.

Regex can do this in a degenerate case but it becomes unmanageable fast.
But since u asked

$xml =~
s{(\s*)([^<]*)\s*([^<]*)(\s*)}{$1$2$4}sg;

HTH


At 09:25 PM 2/26/2010 +0530, Kprasad wrote:
>Hi All
>
>What will be the perfect Regular Expression to convert below mentioned
'Search Text' to 'Replacement Text' while 'Single Line' option is ON.
>
>When I use below mentioned Regex
>]+)?>((?!<\/index-entry>).*?)\s*([0
-9]+)
>
>And replaces wrongly
>
>arousal disordersdisorders of arousal
>
>.
>
>Search Text:
>
>
>APOE e4 variant 18
>
>
>arousal disorders disorders of arousal
>
>
>arterial blood gas tests 32
>
>
>asthma 28--9, 295
>
>
>Correct Replacement Text should be:
>
>
>APOE e4 variant
>
>
>arousal disorders disorders of arousal
>
>
>arterial blood gas tests
>
>
>asthma
>
>


--
REMEMBER THE WORLD TRADE CENTER ---=< WTC 911 >=--
"...ne cede malis"

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Perl Regex

2010-02-26 Thread Brian Raven

From: perl-win32-users-boun...@listserv.activestate.com
[mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of
Kprasad
Sent: 26 February 2010 15:56
To: perl-win32-users@listserv.ActiveState.com
Subject: Perl Regex

> Hi All
>  
> What will be the perfect Regular Expression to convert below mentioned
'Search Text' to 'Replacement 
> Text' while 'Single Line' option is ON.
>  
> When I use below mentioned Regex
>
]+)?>((?!<\/index-entry>).*?)\s*
([0-9]+)
>  
> And replaces wrongly

I think it is going to be hard to be of much help. Mostly because you
don't show us any Perl.

First, a regular expression can't change anything, it can only match.

Second, I find it easier to work out what is going on with non-trivial
regular expressions if I use the 'x' switch, which allows me to break
the RE over multiple lines, and include comments. Particularly useful
with the 'qr' quoting operator. Your RE, for example, might look like
this.

my $re=qr{]+)?>
  ((?!<\/index-entry>).*?)
  
  \s*
  
  ([0-9]+)
  
  }x;

However, as you don't provide any information on how that RE is used,
its going to be difficult to say what might be going wrong. If you could
provide a small example script, that we could cut & paste & run, it
would make it much easier.

Finally, your data looks a lot like XML. A dedicated parser will
generally do a more reliable job of parsing XML that regular
expressions, even Perl regular expressions.

HTH

-- 
Brian Raven 

Please consider the environment before printing this email.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient or have received this e-mail in error, please advise 
the sender immediately by reply e-mail and delete this message and any 
attachments without retaining a copy.

Any unauthorised copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Perl Regex

2010-02-26 Thread Kprasad
Hi All

What will be the perfect Regular Expression to convert below mentioned 'Search 
Text' to 'Replacement Text' while 'Single Line' option is ON.

When I use below mentioned Regex
]+)?>((?!<\/index-entry>).*?)\s*([0-9]+)

And replaces wrongly

arousal disordersdisorders of arousal

.

Search Text:


APOE e4 variant 18


arousal disorders disorders of arousal


arterial blood gas tests 32


asthma 28--9, 295


Correct Replacement Text should be:


APOE e4 variant


arousal disorders disorders of arousal


arterial blood gas tests


asthma


Kanhaiya___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs