Try this.
* warning * assumes all your input files will look like your example -- ie
a bunch of text to copy directly over to the output file followed by a
chunk of <LNK>'s followed by more stuff to copy
__start code__
open(IN, "in.txt") || die("Could not open file $!");
open(OUT, "out.txt") || die("Could not open file $!");
my $sBefore = "";
my $sAfter = "";
my $rhLnks = {};
while(<IN>)
{
chomp($_);
if($_ !~ /([^<]+)<LNK:;([^>]+)>([^<]+)/)
{
$sBefore .= "$_\n" if(scalar(keys %{ $rhLnks }) < 1);
$sAfter .= "$_\n" if(scalar(keys %{ $rhLnks }) < 1);
next;
}
my $by = $1;
my $param = $2;
my $content = $3;
$by =~ s/\s+by\s+//;
print "content: $content\n";
if(exists $rhLnks->{$content})
{
$rhLnks->{$content}->{by} .= " and $by";
}
else
{
$rhLnks->{$content} =
{
by => "$by",
param => "$param",
};
}
}
print OUT $sBefore;
print OUT "$rhLnks->{$_}->{by} by
<LNK:;$rhLnks->{$_}->{param}>$_</LNK>\n" foreach (keys %{ $rhLnks });
print OUT $sAfter;
close IN;
close OUT;
__end code__
At 08:40 18.06.2001 -0400, [EMAIL PROTECTED] wrote:
>Input example:
>
>!EC
>1999 TNT 230-4
>!CU
>Administrative Rulings
>!CU
>Administrative Rulings
>!CS
>IRS Revenue Rulings
>!DN
>Doc 1999-37669 (3 original original pages)
>!TS #each of the following pairs should be combined because the <LNK ...
>strings match
>Modified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
>Amplified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
>Superseded by <LNK:;rev. rul. 2000-56>Rev. Rul. 2000-56</LNK>
>Supplemented by <LNK:;rev. rul. 2000-56>Rev. Rul. 2000-56</LNK>
>!GI
>United States
>!IA
>Internal Revenue Service
>!DP
>30 Nov 1999
>!PD
>01 Dec 1999
>. . .
>=========================================
>
>Output example:
>
>!EC
>1999 TNT 230-4
>!CU
>Administrative Rulings
>!CU
>Administrative Rulings
>!CS
>IRS Revenue Rulings
>!DN
>Doc 1999-37669 (3 original original pages)
>!TS #notice that the first of the two lines remain, and the second is
>properly
>removed
> #the lines are also combined as I want them. The 'old' and 'new'
> tags are
>for my viewing
>old Modified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
>new Modified and Amplified by <LNK:;rev. rul. 2000-4>Rev. Rul. 2000-4</LNK>
>old Superseded by <LNK:;rev. rul. 2000-56>Rev. Rul. 2000-56</LNK>
>new Superseded and Supplemented by <LNK:;rev. rul. 2000-56>Rev. Rul.
>2000-56</LNK>
>!GI
>United States
>!IA
>Internal Revenue Service
>!DP
>30 Nov 1999
>!PD
>01 Dec 1999
>. . .
Aaron Craig
Programming
iSoftitler.com