Re: Extract multiple lines

Hans Meier (John Doe) Thu, 23 Feb 2006 04:05:54 -0800

Jack Daniels (Butch) am Donnerstag, 23. Februar 2006 10.30:
> It's driving me bonkers and can't afford any more psychiatic bills. The
> data is a saved .txt file when viewing from a website. The vendor will not
> give us an actual file even though we payed a montly fee for use of the
> database. I have around 5000 records that need to be converted to MARC
> cataloging records. I need to either have the data from each heading on 1
> line or have the script extract each heading and all the subsequent lines.
>
> The script is only extracting the first line of the heading..


Yes, with every loop trough @lines, you overwrite your variables $title to 
$dewey.

> I can only 
> have 1 blank line between each record which works in the script. If I right
> click then import to excel when viewing the records at the website, each
> heading is a continous string, which is what I need. I can then save as a
> tab delimited file and the lines for each heading remian continuous, which
> works. But we have ceased our subscription and I now only have saved .txt
> files of the 5000 records.   I can't figure out how and where to modify the
> script to work on the files. I suppose I could spend a couple months
> manually joining lines, but that really cuts into naptime.

I don't know what MARC cataloging records are nor is my english enough good to 
understand what you exactly mean, and I don't know if the leading spaces on 
every line below are in the sample data, but It may help you to produce a CSV 
file from the data.

So, you can adjust my script below or wait for undoubtly arriving better 
solutions:

[...]

> HERE IS THE SCRIPT
>
> open(MYINPUTFILE, "<1000chomp.txt"); # open for input
>
> my(@lines) = <MYINPUTFILE>; # read file into list
>
>
> my $title;
>         my $series;
>         my $subjects;
>         my $physical;
>         my $synopsis;
>         my $producer;
>         my $copyrighted;
>         my $dewey;
> for my $line (@lines)
> {
>
> $line =~ /Title/ and $title = $line;
>    $line =~ /Title/ and print "=LDR  00000nam  2200000Ia 45e0\n","=245
> 00\$a",$line;
>
> $line =~ /Dewey/ and $dewey = $line;
>    $line =~ /Dewey/ and print "=082  \\\\\$a",$line;
>
> $line =~ /Producer/ and $producer = $line;
>    $line =~ /Producer/ and print "=040  \\\\\$aCaSRRI\n","=260
> \\\\\$a",$line;
>
> $line =~ /Copyrighted/ and $copyrighted = $line;
>    $line =~ /Copyrighted/ and print "=261  \\\\\$c",$line;
>
> $line =~ /Physical/ and $physical = $line;
>    $line =~ /Physical/ and print "=300  \\\\\$a1 videocassette ( min.)
>
> :\$bsd., col. ;\$c13 mm.",$line;
>
> $line =~ /Series/ and $series = $line;
>    $line =~ /Series/ and print "=440  0\\\$a",$line;
>
> $line =~ /Synopsis/ and $synopsis = $line;
>    $line =~ /Synopsis/ and print "=520  \\\\\$a",$line;
>
> $line =~ /Subjects/ and $subjects = $line;
>    $line =~ /Subjects/ and print "=550  \\\\\$a",$line,"\n";

========================
#!/usr/bin/perl
use strict;
use warnings;


local $/=""; # split data at 1..n empty lines

# btw: Series does not occur in the sample data
my 
$stops=qr/(?:Title)|(?:Physical)|(?:Copyrighted)|(?:Producer)|(?:Dewey)|(?:Synopsis)|(?:Subjects)|(?:Series)/;

for my $record (<DATA>) {

  my @pairs=split (/($stops)/, $record);
  shift @pairs; # remove the undef 1st entry

  my [EMAIL PROTECTED];

  $keyed{$_}=~s/\s+/ /gs for keys %keyed;

  # now you have one record as key/one-line-value pairs
  # for further processing, see:

  print join "\n", map {"$_=>$keyed{$_}"} keys %keyed;
  print "\n\n";

  # you could sort it, produce a CSV-file, ...
}


__DATA__
      Title 10 fastest growing careers: jobs for the future part four
business
      and computer technology (03616)
      Physical Color; Sound; 15 minutes
      Copyrighted 1990
      Producer GUIDANCE ASSOCIATES (GUID)
      Dewey 371.425
      Synopsis Contents: The business community depends on up-to-the minute
      technology - technology that is changing rapidly. As a result, careers
in
      technology, especially computers and specialized areas such as
accounting
      are much in demand. Takes a look at three business and computer
careers:
      software engineering, computer programming and accounting.
      Subjects CAREER GUIDANCE; CAREER SERVICES
      Holdings
         1/2 VHS video: Head Office, 1 copy



      Title 10 fastest growing careers: jobs for the future part one legal
and
      health (03613)
      Physical Color; Sound; 15 minutes
      Copyrighted 1990
      Producer GUIDANCE ASSOCIATES (GUID)
      Dewey 371.425
      Synopsis Contents: Takes a look at the fast growing health and legal
      fields. Talks to a registered nurse about her changing role in a major
      hospital, a physician's assistant who works with two doctors in a busy
      family practice, and a paralegal who works with an attorney.
      Subjects CAREER GUIDANCE; CAREER SERVICES
      Holdings
         1/2 VHS video: Head Office, 1 copy
=============

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Extract multiple lines

Reply via email to