Here is the chunk of code which I used to perform this task:

 open(XML, "<$ARGV[0]") or die "Can not open $ARGV[0]: $!";
 my $xmltext;
 {
  local $/ = undef;
  $xmltext=<XML>;
 }
 close(XML);
 while($xmltext=~ 
/<index-entry(?:[^>]+)?>(?:.*?)<\/index-entry>(?:[^\n]*?)<pageId>([^<]+)<\/pageId>/is)
 {
  $page=$2;
  $page=~ s/ *\n+\t+/ /g;
  $page=~ s/, /,/g;
  $xmltext=~ 
s|<index-entry(?:[^>]+)?>(.*?)</index-entry>(?:[^\n]*?)<pageId>[^<]+</pageId>|<index-entry
 
chid="$1" pages="$page">$2</index-entry>|s
 }
 $xmltext=~ s/<index-entry chid="/<index-entry id="/;
 open(XMLOUT, ">$localpath/$xmlfile\_final.xml") or die "Can not open 
$localpath/$xmlfile\_final.xml: $!";
 print XMLOUT $xmltext;
 close(XMLOUT);

Thanks
Kanhaiya

----- Original Message ----- 
From: "Brian Raven" <bra...@nyx.com>
To: <perl-win32-users@listserv.ActiveState.com>
Sent: Friday, February 26, 2010 10:22 PM
Subject: RE: Perl Regex


>
> From: perl-win32-users-boun...@listserv.activestate.com
> [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of
> Kprasad
> Sent: 26 February 2010 15:56
> To: perl-win32-users@listserv.ActiveState.com
> Subject: Perl Regex
>
>> Hi All
>>
>> What will be the perfect Regular Expression to convert below mentioned
> 'Search Text' to 'Replacement
>> Text' while 'Single Line' option is ON.
>>
>> When I use below mentioned Regex
>>
> <index-entry(?:[^>]+)?>((?!<\/index-entry>).*?)</index-entry>\s*<pageId>
> ([0-9]+)</pageId>
>>
>> And replaces wrongly
>
> I think it is going to be hard to be of much help. Mostly because you
> don't show us any Perl.
>
> First, a regular expression can't change anything, it can only match.
>
> Second, I find it easier to work out what is going on with non-trivial
> regular expressions if I use the 'x' switch, which allows me to break
> the RE over multiple lines, and include comments. Particularly useful
> with the 'qr' quoting operator. Your RE, for example, might look like
> this.
>
> my $re=qr{<index-entry(?:[^>]+)?>
>   ((?!<\/index-entry>).*?)
>   </index-entry>
>   \s*
>   <pageId>
>   ([0-9]+)
>   </pageId>
>      }x;
>
> However, as you don't provide any information on how that RE is used,
> its going to be difficult to say what might be going wrong. If you could
> provide a small example script, that we could cut & paste & run, it
> would make it much easier.
>
> Finally, your data looks a lot like XML. A dedicated parser will
> generally do a more reliable job of parsing XML that regular
> expressions, even Perl regular expressions.
>
> HTH
>
> -- 
> Brian Raven
>
> Please consider the environment before printing this email.
>
> This e-mail may contain confidential and/or privileged information. If you 
> are not the intended recipient or have received this e-mail in error, 
> please advise the sender immediately by reply e-mail and delete this 
> message and any attachments without retaining a copy.
>
> Any unauthorised copying, disclosure or distribution of the material in 
> this e-mail is strictly forbidden.
>
> _______________________________________________
> Perl-Win32-Users mailing list
> Perl-Win32-Users@listserv.ActiveState.com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>
> 

_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to