Jack Daniels (Butch) am Donnerstag, 23. Februar 2006 10.30:
> It's driving me bonkers and can't afford any more psychiatic bills. The
> data is a saved .txt file when viewing from a website. The vendor will not
> give us an actual file even though we payed a montly fee for use of the
> database. I have around 5000 records that need to be converted to MARC
> cataloging records. I need to either have the data from each heading on 1
> line or have the script extract each heading and all the subsequent lines.
>
> The script is only extracting the first line of the heading..
Yes, with every loop trough @lines, you overwrite your variables $title to
$dewey.
> I can only
> have 1 blank line between each record which works in the script. If I right
> click then import to excel when viewing the records at the website, each
> heading is a continous string, which is what I need. I can then save as a
> tab delimited file and the lines for each heading remian continuous, which
> works. But we have ceased our subscription and I now only have saved .txt
> files of the 5000 records. I can't figure out how and where to modify the
> script to work on the files. I suppose I could spend a couple months
> manually joining lines, but that really cuts into naptime.
I don't know what MARC cataloging records are nor is my english enough good to
understand what you exactly mean, and I don't know if the leading spaces on
every line below are in the sample data, but It may help you to produce a CSV
file from the data.
So, you can adjust my script below or wait for undoubtly arriving better
solutions:
[...]
> HERE IS THE SCRIPT
>
> open(MYINPUTFILE, "<1000chomp.txt"); # open for input
>
> my(@lines) = <MYINPUTFILE>; # read file into list
>
>
> my $title;
> my $series;
> my $subjects;
> my $physical;
> my $synopsis;
> my $producer;
> my $copyrighted;
> my $dewey;
> for my $line (@lines)
> {
>
> $line =~ /Title/ and $title = $line;
> $line =~ /Title/ and print "=LDR 00000nam 2200000Ia 45e0\n","=245
> 00\$a",$line;
>
> $line =~ /Dewey/ and $dewey = $line;
> $line =~ /Dewey/ and print "=082 \\\\\$a",$line;
>
> $line =~ /Producer/ and $producer = $line;
> $line =~ /Producer/ and print "=040 \\\\\$aCaSRRI\n","=260
> \\\\\$a",$line;
>
> $line =~ /Copyrighted/ and $copyrighted = $line;
> $line =~ /Copyrighted/ and print "=261 \\\\\$c",$line;
>
> $line =~ /Physical/ and $physical = $line;
> $line =~ /Physical/ and print "=300 \\\\\$a1 videocassette ( min.)
>
> :\$bsd., col. ;\$c13 mm.",$line;
>
> $line =~ /Series/ and $series = $line;
> $line =~ /Series/ and print "=440 0\\\$a",$line;
>
> $line =~ /Synopsis/ and $synopsis = $line;
> $line =~ /Synopsis/ and print "=520 \\\\\$a",$line;
>
> $line =~ /Subjects/ and $subjects = $line;
> $line =~ /Subjects/ and print "=550 \\\\\$a",$line,"\n";
========================
#!/usr/bin/perl
use strict;
use warnings;
local $/=""; # split data at 1..n empty lines
# btw: Series does not occur in the sample data
my
$stops=qr/(?:Title)|(?:Physical)|(?:Copyrighted)|(?:Producer)|(?:Dewey)|(?:Synopsis)|(?:Subjects)|(?:Series)/;
for my $record (<DATA>) {
my @pairs=split (/($stops)/, $record);
shift @pairs; # remove the undef 1st entry
my [EMAIL PROTECTED];
$keyed{$_}=~s/\s+/ /gs for keys %keyed;
# now you have one record as key/one-line-value pairs
# for further processing, see:
print join "\n", map {"$_=>$keyed{$_}"} keys %keyed;
print "\n\n";
# you could sort it, produce a CSV-file, ...
}
__DATA__
Title 10 fastest growing careers: jobs for the future part four
business
and computer technology (03616)
Physical Color; Sound; 15 minutes
Copyrighted 1990
Producer GUIDANCE ASSOCIATES (GUID)
Dewey 371.425
Synopsis Contents: The business community depends on up-to-the minute
technology - technology that is changing rapidly. As a result, careers
in
technology, especially computers and specialized areas such as
accounting
are much in demand. Takes a look at three business and computer
careers:
software engineering, computer programming and accounting.
Subjects CAREER GUIDANCE; CAREER SERVICES
Holdings
1/2 VHS video: Head Office, 1 copy
Title 10 fastest growing careers: jobs for the future part one legal
and
health (03613)
Physical Color; Sound; 15 minutes
Copyrighted 1990
Producer GUIDANCE ASSOCIATES (GUID)
Dewey 371.425
Synopsis Contents: Takes a look at the fast growing health and legal
fields. Talks to a registered nurse about her changing role in a major
hospital, a physician's assistant who works with two doctors in a busy
family practice, and a paralegal who works with an attorney.
Subjects CAREER GUIDANCE; CAREER SERVICES
Holdings
1/2 VHS video: Head Office, 1 copy
=============
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>