Re: Please help... struggling beginner.

2003-06-23 Thread Paul Johnson

Denham Eva said:
> Hello,
>
> I am very much a novice at perl and probably bitten off more than I can
> chew
> here.
> I have a file, which is a dump of a database - so it is a fixed file
> format.
> The problem is that I am struggling to manipulate it correctly. I have
> been
> trying for two days now to get a program to work. The idea is to remove
> the
> duplicate records, ie a record begins with Name and ends with Values End.
> The program that I have thus far, is  pathetic in the sense I have opened
> three files, the file below, a data file for cleaned data, and a file for
> capturing the usernames already processed. But I have got stuck on how to
> compare and work through the file line for line and then only to capture
> the
> lines that are not duplicated.
> Please help - I am running out of time.

You forgot to attach your code.  If you let us see what you've done it
is usually easier to provide relevant help.

I'm not sure I completely understand your problem, but here is a script
which will remove records with duplicate names.


#!/bin/perl -w

use strict;

$/ = "#". "-" x 77 . "\n";

my %seen;

while (<>)
{
if (my ($name) = /^Name  : +(\S+)/)
{
next if $seen{$name};
$seen{$name}++;
}
print;
}

__END__


The trick is to set $/ to the line which is separating your records so
that each record is read in as a whole.  Then I simply extract the name
from the record and don't print it if it has been seen already.  I
suspect that your actual requirements will differ here.

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Please help... struggling beginner.

2003-06-23 Thread Tassilo von Parseval
On Mon, Jun 23, 2003 at 10:43:07AM +0200 Denham Eva wrote:

> I am very much a novice at perl and probably bitten off more than I can chew
> here. 
> I have a file, which is a dump of a database - so it is a fixed file format.
> The problem is that I am struggling to manipulate it correctly. I have been
> trying for two days now to get a program to work. The idea is to remove the
> duplicate records, ie a record begins with Name and ends with Values End.
> The program that I have thus far, is  pathetic in the sense I have opened
> three files, the file below, a data file for cleaned data, and a file for
> capturing the usernames already processed. But I have got stuck on how to
> compare and work through the file line for line and then only to capture the
> lines that are not duplicated.

Keeping a couple of files around is not necessarily pathetic. I think
you don't need a file for the processed usernames. But the original file
and one for the processed data is a totally common pattern.

> Here is the file format
> 
> 
> #DB dumped
> #DB version 8.0
> #SW version 2.6(1.10)
> #---
> --
> Name  :   system
> Some stuff here... 
> many lines
> Of different format... 
> such as line below...
> User Count:   0
> ##--- User End
> Lots of text here...
> Until...
> We get line below...
> ##--- Values End
> #---
> --

So, "#-..." is essentially the record separator? A fixed separator
is good because it makes processing rather easy. It might be handy to
both set the input record separator to this value:

#! /usr/bin/perl -w

use strict;

local $/ = 
"#-\n";

open IN, "old_database" or die $!;
open OUT, ">new_database" or die $!;

# keep track of what records have already been seen
my %records_seen;

# this is the 'header', that is: what is before the first record
print OUT scalar ;  

while () {
if (/Name\s+:\s+(\S+)/) {
#^^^
# $1 is record name
next if $records_seen{ $1 }++;
print OUT $_;
}
}

print OUT "#End Of Dump\n";

close IN;
close OUT;

Tassilo
-- 
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~;eval


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Please help... struggling beginner.

2003-06-23 Thread Denham Eva
Hello,

I am very much a novice at perl and probably bitten off more than I can chew
here. 
I have a file, which is a dump of a database - so it is a fixed file format.
The problem is that I am struggling to manipulate it correctly. I have been
trying for two days now to get a program to work. The idea is to remove the
duplicate records, ie a record begins with Name and ends with Values End.
The program that I have thus far, is  pathetic in the sense I have opened
three files, the file below, a data file for cleaned data, and a file for
capturing the usernames already processed. But I have got stuck on how to
compare and work through the file line for line and then only to capture the
lines that are not duplicated.
Please help - I am running out of time.

Here is the file format


#DB dumped
#DB version 8.0
#SW version 2.6(1.10)
#---
--
Name  : system
Some stuff here... 
many lines
Of different format... 
such as line below...
User Count: 0
##--- User End
Lots of text here...
Until...
We get line below...
##--- Values End
#---
--
Name  : ###profile0
Some stuff here... 
many lines
Of different format... 
such as line below...
User Count: 188
##--- User End
Lots of text here...
Until...
We get line below...
##--- Values End
#---
--
Name  : vermaakm
Some stuff here... 
many lines
Of different format... 
such as line below...
User Count: 0
##--- User End
Lots of text here...
Until...
We get line below...
##--- Values End
#---
--
Name  : TFMC\vanzylm
Some stuff here... 
many lines
Of different format... 
such as line below...
CounterRst_01 : 2acac9101c335c8
##--- User End
Lots of text here...
Until...
We get line below...
##--- Values End
#---
--
#End Of Dump




Denham Eva
Oracle DBA
Linux like TeePee... No Windows, No Gates and Apache inside!


_
This e-mail message has been scanned for Viruses and Content and cleared 
by MailMarshal

For more information please visit www.marshalsoftware.com
_

#
Note:
This message is for the named person's use only.  It may contain confidential,
proprietary or legally privileged information.  No confidentiality or privilege
is waived or lost by any mistransmission.  If you receive this message in error,
please immediately delete it and all copies of it from your system, destroy any
hard copies of it and notify the sender.  You must not, directly or indirectly,
use, disclose, distribute, print, or copy any part of this message if you are not
the intended recipient. TFMC and any of its subsidiaries each reserve
the right to monitor all e-mail communications through its networks.

Any views expressed in this message are those of the individual sender, except where
the message states otherwise and the sender is authorized to state them to be the
views of any such entity.

Thank You.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]