Here is the chunk of code which I used to perform this task: open(XML, "<$ARGV[0]") or die "Can not open $ARGV[0]: $!"; my $xmltext; { local $/ = undef; $xmltext=<XML>; } close(XML); while($xmltext=~ /<index-entry(?:[^>]+)?>(?:.*?)<\/index-entry>(?:[^\n]*?)<pageId>([^<]+)<\/pageId>/is) { $page=$2; $page=~ s/ *\n+\t+/ /g; $page=~ s/, /,/g; $xmltext=~ s|<index-entry(?:[^>]+)?>(.*?)</index-entry>(?:[^\n]*?)<pageId>[^<]+</pageId>|<index-entry chid="$1" pages="$page">$2</index-entry>|s } $xmltext=~ s/<index-entry chid="/<index-entry id="/; open(XMLOUT, ">$localpath/$xmlfile\_final.xml") or die "Can not open $localpath/$xmlfile\_final.xml: $!"; print XMLOUT $xmltext; close(XMLOUT);
Thanks Kanhaiya ----- Original Message ----- From: "Brian Raven" <bra...@nyx.com> To: <perl-win32-users@listserv.ActiveState.com> Sent: Friday, February 26, 2010 10:22 PM Subject: RE: Perl Regex > > From: perl-win32-users-boun...@listserv.activestate.com > [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of > Kprasad > Sent: 26 February 2010 15:56 > To: perl-win32-users@listserv.ActiveState.com > Subject: Perl Regex > >> Hi All >> >> What will be the perfect Regular Expression to convert below mentioned > 'Search Text' to 'Replacement >> Text' while 'Single Line' option is ON. >> >> When I use below mentioned Regex >> > <index-entry(?:[^>]+)?>((?!<\/index-entry>).*?)</index-entry>\s*<pageId> > ([0-9]+)</pageId> >> >> And replaces wrongly > > I think it is going to be hard to be of much help. Mostly because you > don't show us any Perl. > > First, a regular expression can't change anything, it can only match. > > Second, I find it easier to work out what is going on with non-trivial > regular expressions if I use the 'x' switch, which allows me to break > the RE over multiple lines, and include comments. Particularly useful > with the 'qr' quoting operator. Your RE, for example, might look like > this. > > my $re=qr{<index-entry(?:[^>]+)?> > ((?!<\/index-entry>).*?) > </index-entry> > \s* > <pageId> > ([0-9]+) > </pageId> > }x; > > However, as you don't provide any information on how that RE is used, > its going to be difficult to say what might be going wrong. If you could > provide a small example script, that we could cut & paste & run, it > would make it much easier. > > Finally, your data looks a lot like XML. A dedicated parser will > generally do a more reliable job of parsing XML that regular > expressions, even Perl regular expressions. > > HTH > > -- > Brian Raven > > Please consider the environment before printing this email. > > This e-mail may contain confidential and/or privileged information. If you > are not the intended recipient or have received this e-mail in error, > please advise the sender immediately by reply e-mail and delete this > message and any attachments without retaining a copy. > > Any unauthorised copying, disclosure or distribution of the material in > this e-mail is strictly forbidden. > > _______________________________________________ > Perl-Win32-Users mailing list > Perl-Win32-Users@listserv.ActiveState.com > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs > > _______________________________________________ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs