Re: Re: Perl Regex

2010-03-26 Thread Mark Bergeron

Does this have to be hard coded in to the script? Just wondering since I have been kinda following this thread.Feb 26, 2010 01:56:02 PM, perl-win32-users-boun...@listserv.activestate.com wrote:
It looks like what u want to do is attribute folding. That's when u take anested XML tag and make it an attribute of an enclosing tag. Ur doingsomething slightly different which is merging equal depth tags. The rightway to do this is with an XML parser. Look into XML::Simple to get started.U would read in the XML to a hash, manipulate the data in the hash, and thenwrite out a new XML file.Regex can do this in a degenerate case but it becomes unmanageable fast.But since u asked$xml =~s{(\s*)([^]*)\s*([^]*)eId(\s*)}{$1]+)?((?!\/index-entry).*?)\s*([0-9]+)And replaces wronglyarousal disorders

Re: Perl Regex

2010-02-28 Thread Jon Bjornstad
Yes, you could use an XML parser to do the job described below
but this case is pretty simple.   Here's my offering
leaving out the reading/writing of the files.

--
my $s = EOF;
index-item
index-entryAPOE e4 variant/index-entry pageId18/pageId
/index-item
index-item
index-entryarousal disorders/index-entry see href=c-86679-1  
label=seedisorders of arousal/see
/index-item
index-item
index-entryarterial blood gas tests/index-entry pageId32/pageId
/index-item
index-item
index-entryasthma/index-entry pageId28--9, 295/pageId
/index-item
EOF

$s =~ s{index-entry(.*?)/index-entry\s*pageId(.*?)/pageId}
{index-entry pages=$2$1/index-entry}g;

print $s;
--

You could replace the two .*? with [^]* if you wanted to be more  
precise
but it looks more confusing.

Jon

==  original query 

Hi All

What will be the perfect Regular Expression to convert below mentioned  
'Search Text' to 'Replacement Text' while 'Single Line' option is ON.

When I use below mentioned Regex
index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry 
\s*pageId([0-9]+)/pageId

And replaces wrongly

index-entry pages=32arousal disorders/index-entrysee  
href=c-86679-1 label=seedisorders of arousal/see
/index-item
.

Search Text:

index-item
index-entryAPOE e4 variant/index-entry pageId18/pageId
/index-item
index-item
index-entryarousal disorders/index-entry see href=c-86679-1  
label=seedisorders of arousal/see
/index-item
index-item
index-entryarterial blood gas tests/index-entry pageId32/pageId
/index-item
index-item
index-entryasthma/index-entry pageId28--9, 295/pageId
/index-item

Correct Replacement Text should be:

index-item
index-entry pages=18APOE e4 variant/index-entry
/index-item
index-item
index-entryarousal disorders/index-entry see href=c-86679-1  
label=seedisorders of arousal/see
/index-item
index-item
index-entry pages=32arterial blood gas tests/index-entry
/index-item
index-item
index-entry pages=28--29,295asthma/index-entry
/index-item

Kanhaiya

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Perl Regex

2010-02-26 Thread Brian Raven

From: perl-win32-users-boun...@listserv.activestate.com
[mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of
Kprasad
Sent: 26 February 2010 15:56
To: perl-win32-users@listserv.ActiveState.com
Subject: Perl Regex

 Hi All
  
 What will be the perfect Regular Expression to convert below mentioned
'Search Text' to 'Replacement 
 Text' while 'Single Line' option is ON.
  
 When I use below mentioned Regex

index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry\s*pageId
([0-9]+)/pageId
  
 And replaces wrongly

I think it is going to be hard to be of much help. Mostly because you
don't show us any Perl.

First, a regular expression can't change anything, it can only match.

Second, I find it easier to work out what is going on with non-trivial
regular expressions if I use the 'x' switch, which allows me to break
the RE over multiple lines, and include comments. Particularly useful
with the 'qr' quoting operator. Your RE, for example, might look like
this.

my $re=qr{index-entry(?:[^]+)?
  ((?!\/index-entry).*?)
  /index-entry
  \s*
  pageId
  ([0-9]+)
  /pageId
  }x;

However, as you don't provide any information on how that RE is used,
its going to be difficult to say what might be going wrong. If you could
provide a small example script, that we could cut  paste  run, it
would make it much easier.

Finally, your data looks a lot like XML. A dedicated parser will
generally do a more reliable job of parsing XML that regular
expressions, even Perl regular expressions.

HTH

-- 
Brian Raven 

Please consider the environment before printing this email.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient or have received this e-mail in error, please advise 
the sender immediately by reply e-mail and delete this message and any 
attachments without retaining a copy.

Any unauthorised copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Perl Regex

2010-02-26 Thread Chris Wagner
It looks like what u want to do is attribute folding.  That's when u take a
nested XML tag and make it an attribute of an enclosing tag.  Ur doing
something slightly different which is merging equal depth tags.  The right
way to do this is with an XML parser.  Look into XML::Simple to get started.
U would read in the XML to a hash, manipulate the data in the hash, and then
write out a new XML file.

Regex can do this in a degenerate case but it becomes unmanageable fast.
But since u asked

$xml =~
s{index-item(\s*)index-entry([^]*)/index-entry\s*pageId([^]*)/pag
eId(\s*)/index-item}{index-item$1index-entry
pages=$3$2/index-entry$4/index-item}sg;

HTH


At 09:25 PM 2/26/2010 +0530, Kprasad wrote:
Hi All

What will be the perfect Regular Expression to convert below mentioned
'Search Text' to 'Replacement Text' while 'Single Line' option is ON.

When I use below mentioned Regex
index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry\s*pageId([0
-9]+)/pageId

And replaces wrongly

index-entry pages=32arousal disorders/index-entrysee
href=c-86679-1 label=seedisorders of arousal/see
/index-item
.

Search Text:

index-item
index-entryAPOE e4 variant/index-entry pageId18/pageId
/index-item
index-item
index-entryarousal disorders/index-entry see href=c-86679-1
label=seedisorders of arousal/see
/index-item
index-item
index-entryarterial blood gas tests/index-entry pageId32/pageId
/index-item
index-item
index-entryasthma/index-entry pageId28--9, 295/pageId
/index-item

Correct Replacement Text should be:

index-item
index-entry pages=18APOE e4 variant/index-entry
/index-item
index-item
index-entryarousal disorders/index-entry see href=c-86679-1
label=seedisorders of arousal/see
/index-item
index-item
index-entry pages=32arterial blood gas tests/index-entry
/index-item
index-item
index-entry pages=28--29,295asthma/index-entry
/index-item



--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Perl Regex

2010-02-26 Thread Kprasad
Here is the chunk of code which I used to perform this task:

 open(XML, $ARGV[0]) or die Can not open $ARGV[0]: $!;
 my $xmltext;
 {
  local $/ = undef;
  $xmltext=XML;
 }
 close(XML);
 while($xmltext=~ 
/index-entry(?:[^]+)?(?:.*?)\/index-entry(?:[^\n]*?)pageId([^]+)\/pageId/is)
 {
  $page=$2;
  $page=~ s/ *\n+\t+/ /g;
  $page=~ s/, /,/g;
  $xmltext=~ 
s|index-entry(?:[^]+)?(.*?)/index-entry(?:[^\n]*?)pageId[^]+/pageId|index-entry
 
chid=$1 pages=$page$2/index-entry|s
 }
 $xmltext=~ s/index-entry chid=/index-entry id=/;
 open(XMLOUT, $localpath/$xmlfile\_final.xml) or die Can not open 
$localpath/$xmlfile\_final.xml: $!;
 print XMLOUT $xmltext;
 close(XMLOUT);

Thanks
Kanhaiya

- Original Message - 
From: Brian Raven bra...@nyx.com
To: perl-win32-users@listserv.ActiveState.com
Sent: Friday, February 26, 2010 10:22 PM
Subject: RE: Perl Regex



 From: perl-win32-users-boun...@listserv.activestate.com
 [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of
 Kprasad
 Sent: 26 February 2010 15:56
 To: perl-win32-users@listserv.ActiveState.com
 Subject: Perl Regex

 Hi All

 What will be the perfect Regular Expression to convert below mentioned
 'Search Text' to 'Replacement
 Text' while 'Single Line' option is ON.

 When I use below mentioned Regex

 index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry\s*pageId
 ([0-9]+)/pageId

 And replaces wrongly

 I think it is going to be hard to be of much help. Mostly because you
 don't show us any Perl.

 First, a regular expression can't change anything, it can only match.

 Second, I find it easier to work out what is going on with non-trivial
 regular expressions if I use the 'x' switch, which allows me to break
 the RE over multiple lines, and include comments. Particularly useful
 with the 'qr' quoting operator. Your RE, for example, might look like
 this.

 my $re=qr{index-entry(?:[^]+)?
   ((?!\/index-entry).*?)
   /index-entry
   \s*
   pageId
   ([0-9]+)
   /pageId
  }x;

 However, as you don't provide any information on how that RE is used,
 its going to be difficult to say what might be going wrong. If you could
 provide a small example script, that we could cut  paste  run, it
 would make it much easier.

 Finally, your data looks a lot like XML. A dedicated parser will
 generally do a more reliable job of parsing XML that regular
 expressions, even Perl regular expressions.

 HTH

 -- 
 Brian Raven

 Please consider the environment before printing this email.

 This e-mail may contain confidential and/or privileged information. If you 
 are not the intended recipient or have received this e-mail in error, 
 please advise the sender immediately by reply e-mail and delete this 
 message and any attachments without retaining a copy.

 Any unauthorised copying, disclosure or distribution of the material in 
 this e-mail is strictly forbidden.

 ___
 Perl-Win32-Users mailing list
 Perl-Win32-Users@listserv.ActiveState.com
 To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

 

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs