Try this.

* warning * assumes all your input files will look like your example -- ie 
a bunch of text to copy directly over to the output file followed by a 
chunk of <LNK>'s followed by more stuff to copy

__start code__

         open(IN, "in.txt") || die("Could not open file $!");
         open(OUT, "out.txt") || die("Could not open file $!");
         my $sBefore = "";
         my $sAfter = "";
         my $rhLnks = {};
         while(<IN>)
                 {
                 chomp($_);
                 if($_ !~ /([^<]+)<LNK:;([^>]+)>([^<]+)/)
                         {
                         $sBefore .= "$_\n" if(scalar(keys %{ $rhLnks }) < 1);
                         $sAfter .= "$_\n" if(scalar(keys %{ $rhLnks }) < 1);
                         next;
                         }
                 my $by = $1;
                 my $param = $2;
                 my $content = $3;
                 $by =~ s/\s+by\s+//;
                 print "content: $content\n";
                 if(exists $rhLnks->{$content})
                         {
                         $rhLnks->{$content}->{by} .= " and $by";
                         }
                 else
                         {
                         $rhLnks->{$content} =
                                 {
                                 by              => "$by",
                                 param   => "$param",
                                 };
                         }
                 }
         print OUT $sBefore;
         print OUT "$rhLnks->{$_}->{by} by 
<LNK:;$rhLnks->{$_}->{param}>$_</LNK>\n" foreach (keys %{ $rhLnks });
         print OUT $sAfter;
         close IN;
         close OUT;

__end code__

At 08:40 18.06.2001 -0400, [EMAIL PROTECTED] wrote:
>Input example:
>
>!EC
>1999 TNT 230-4
>!CU
>Administrative Rulings
>!CU
>Administrative Rulings
>!CS
>IRS Revenue Rulings
>!DN
>Doc 1999-37669 (3 original original pages)
>!TS  #each of the following pairs should be combined because the <LNK ...
>strings match
>Modified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
>Amplified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
>Superseded by <LNK:;rev. rul. 2000-56>Rev. Rul. 2000-56</LNK>
>Supplemented by <LNK:;rev. rul. 2000-56>Rev. Rul. 2000-56</LNK>
>!GI
>United States
>!IA
>Internal Revenue Service
>!DP
>30 Nov 1999
>!PD
>01 Dec 1999
>. . .
>=========================================
>
>Output example:
>
>!EC
>1999 TNT 230-4
>!CU
>Administrative Rulings
>!CU
>Administrative Rulings
>!CS
>IRS Revenue Rulings
>!DN
>Doc 1999-37669 (3 original original pages)
>!TS  #notice that the first of the two lines remain, and the second is 
>properly
>removed
>      #the lines are also combined as I want them. The 'old' and 'new' 
> tags are
>for my viewing
>old Modified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
>new Modified and Amplified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
>old Superseded by <LNK:;rev. rul. 2000-56>Rev. Rul. 2000-56</LNK>
>new Superseded and Supplemented by <LNK:;rev. rul. 2000-56>Rev. Rul.
>2000-56</LNK>
>!GI
>United States
>!IA
>Internal Revenue Service
>!DP
>30 Nov 1999
>!PD
>01 Dec 1999
>. . .

Aaron Craig
Programming
iSoftitler.com

Reply via email to