Re: Please help... struggling beginner.
Denham Eva said: > Hello, > > I am very much a novice at perl and probably bitten off more than I can > chew > here. > I have a file, which is a dump of a database - so it is a fixed file > format. > The problem is that I am struggling to manipulate it correctly. I have > been > trying for two days now to get a program to work. The idea is to remove > the > duplicate records, ie a record begins with Name and ends with Values End. > The program that I have thus far, is pathetic in the sense I have opened > three files, the file below, a data file for cleaned data, and a file for > capturing the usernames already processed. But I have got stuck on how to > compare and work through the file line for line and then only to capture > the > lines that are not duplicated. > Please help - I am running out of time. You forgot to attach your code. If you let us see what you've done it is usually easier to provide relevant help. I'm not sure I completely understand your problem, but here is a script which will remove records with duplicate names. #!/bin/perl -w use strict; $/ = "#". "-" x 77 . "\n"; my %seen; while (<>) { if (my ($name) = /^Name : +(\S+)/) { next if $seen{$name}; $seen{$name}++; } print; } __END__ The trick is to set $/ to the line which is separating your records so that each record is read in as a whole. Then I simply extract the name from the record and don't print it if it has been seen already. I suspect that your actual requirements will differ here. -- Paul Johnson - [EMAIL PROTECTED] http://www.pjcj.net -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Please help... struggling beginner.
On Mon, Jun 23, 2003 at 10:43:07AM +0200 Denham Eva wrote: > I am very much a novice at perl and probably bitten off more than I can chew > here. > I have a file, which is a dump of a database - so it is a fixed file format. > The problem is that I am struggling to manipulate it correctly. I have been > trying for two days now to get a program to work. The idea is to remove the > duplicate records, ie a record begins with Name and ends with Values End. > The program that I have thus far, is pathetic in the sense I have opened > three files, the file below, a data file for cleaned data, and a file for > capturing the usernames already processed. But I have got stuck on how to > compare and work through the file line for line and then only to capture the > lines that are not duplicated. Keeping a couple of files around is not necessarily pathetic. I think you don't need a file for the processed usernames. But the original file and one for the processed data is a totally common pattern. > Here is the file format > > > #DB dumped > #DB version 8.0 > #SW version 2.6(1.10) > #--- > -- > Name : system > Some stuff here... > many lines > Of different format... > such as line below... > User Count: 0 > ##--- User End > Lots of text here... > Until... > We get line below... > ##--- Values End > #--- > -- So, "#-..." is essentially the record separator? A fixed separator is good because it makes processing rather easy. It might be handy to both set the input record separator to this value: #! /usr/bin/perl -w use strict; local $/ = "#-\n"; open IN, "old_database" or die $!; open OUT, ">new_database" or die $!; # keep track of what records have already been seen my %records_seen; # this is the 'header', that is: what is before the first record print OUT scalar ; while () { if (/Name\s+:\s+(\S+)/) { #^^^ # $1 is record name next if $records_seen{ $1 }++; print OUT $_; } } print OUT "#End Of Dump\n"; close IN; close OUT; Tassilo -- $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({ pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#; $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~;eval -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Please help... struggling beginner.
Hello, I am very much a novice at perl and probably bitten off more than I can chew here. I have a file, which is a dump of a database - so it is a fixed file format. The problem is that I am struggling to manipulate it correctly. I have been trying for two days now to get a program to work. The idea is to remove the duplicate records, ie a record begins with Name and ends with Values End. The program that I have thus far, is pathetic in the sense I have opened three files, the file below, a data file for cleaned data, and a file for capturing the usernames already processed. But I have got stuck on how to compare and work through the file line for line and then only to capture the lines that are not duplicated. Please help - I am running out of time. Here is the file format #DB dumped #DB version 8.0 #SW version 2.6(1.10) #--- -- Name : system Some stuff here... many lines Of different format... such as line below... User Count: 0 ##--- User End Lots of text here... Until... We get line below... ##--- Values End #--- -- Name : ###profile0 Some stuff here... many lines Of different format... such as line below... User Count: 188 ##--- User End Lots of text here... Until... We get line below... ##--- Values End #--- -- Name : vermaakm Some stuff here... many lines Of different format... such as line below... User Count: 0 ##--- User End Lots of text here... Until... We get line below... ##--- Values End #--- -- Name : TFMC\vanzylm Some stuff here... many lines Of different format... such as line below... CounterRst_01 : 2acac9101c335c8 ##--- User End Lots of text here... Until... We get line below... ##--- Values End #--- -- #End Of Dump Denham Eva Oracle DBA Linux like TeePee... No Windows, No Gates and Apache inside! _ This e-mail message has been scanned for Viruses and Content and cleared by MailMarshal For more information please visit www.marshalsoftware.com _ # Note: This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No confidentiality or privilege is waived or lost by any mistransmission. If you receive this message in error, please immediately delete it and all copies of it from your system, destroy any hard copies of it and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. TFMC and any of its subsidiaries each reserve the right to monitor all e-mail communications through its networks. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorized to state them to be the views of any such entity. Thank You. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]