Removing duplicate fields with MARC::record

2007-07-30 Thread Michael Bowden
Hi Folks:
 
It is me again.  I have another question...
 
I am helping someone clean up her database.  Somehow, the 856 field in her MARC 
records has duplicated itself several times.  She has some records with 15+ 
duplicate 856 fields.  So I am trying, unsuccessfully, to modify one of my 
scripts to delete the duplicate fields.  The problem I am having is that out of 
the 15+ 856 fields, 3 of the 856 fields are unique to the record and need to 
stay.  Here is my code so far:
 
## create a MARC::File::USMARC object
 
use MARC::Batch;
 
my $infile = shift;
my $otfile = shift;
my $batch = MARC::Batc
 
Michael L. Bowden
Coordinator of Automation and Access Services
Associate Professor, Information Science
Harrisburg Area Community College
One HACC Drive
Harrisburg, PA 17110-2999

E: [EMAIL PROTECTED]
T: 717.780.1936
F: 717.780.2462h->new('USMARC',$infile);
$batch->strict_off();
 
open (DBOUT, "> $otfile");
## if $file isn't defined we had trouble with the file
## so exit
 
if (not($infile)) {
print $MARC::File::ERROR,"\n";
exit(0);
}
 
while (my $record = $batch->next()) {
   my @m856 = $record->field('856');
   @m856 = sort {$a cmp $b} @m856;
   my %seen = ();
   my @new856 = ();
   
   foreach ( $record->fields() ) {
  if (@m856) {
 foreach $f (@m856) {
 next if ($seen{ $f }++);
   push @new856, $f;
 $record->delete_field($f);
 }
 }
   }
   $record->insert_fields_ordered( @new856 );
   print DBOUT $record->as_usmarc();
}
 
When I run this script, It put ALL the 856 fields back in the record and they 
are not sort. What am I doing wrong?
 
TIA!
 
Michael

Harrisburg Area Community College



Removing duplicate fields with MARC::record

2007-07-30 Thread Michael Bowden
Hi Folks:
 
Hmmm...  I am not sure how my address got in the middle of the code in that 
last message.  Here is a corrected version!  
 
Michael
 
 
 
It is me again.  I have another question...
 
I am helping someone clean up her database.  Somehow, the 856 field in her MARC 
records has duplicated itself several times.  She has some records with 15+ 
duplicate 856 fields.  So I am trying, unsuccessfully, to modify one of my 
scripts to delete the duplicate fields.  The problem I am having is that out of 
the 15+ 856 fields, 3 of the 856 fields are unique to the record and need to 
stay.  Here is my code so far:
 
## create a MARC::File::USMARC object
 
use MARC::Batch;
 
my $infile = shift;
my $otfile = shift;
my $batch = MARC::Batc

## if $file isn't defined we had trouble with the file
## so exit
 
if (not($infile)) {
print $MARC::File::ERROR,"\n";
exit(0);
}
 
while (my $record = $batch->next()) {
   my @m856 = $record->field('856');
   @m856 = sort {$a cmp $b} @m856;
   my %seen = ();
   my @new856 = ();
   
   foreach ( $record->fields() ) {
  if (@m856) {
 foreach $f (@m856) {
 next if ($seen{ $f }++);
   push @new856, $f;
 $record->delete_field($f);
 }
 }
   }
   $record->insert_fields_ordered( @new856 );
   print DBOUT $record->as_usmarc();
}
 
When I run this script, It put ALL the 856 fields back in the record and they 
are not sort. What am I doing wrong?
 
TIA!
 
Michael
 
 

Harrisburg Area Community College



RE: Removing duplicate fields with MARC::record

2007-07-30 Thread Bryan Baldus
Note: my comments are untested and may not work without modification. Some
parts left to the reader to complete.

On Monday, July 30, 2007 2:16 PM, Michael Bowden wrote:
>   @m856 = sort {$a cmp $b} @m856;

@m856 has MARC::Field objects. Comparing them as such are unlikely to
produce desired results.
better might be @m856 = sort {$a->as_usmarc() cmp $b->as_usmarc()} @m856,
but then you lose the field object. Better might be to leave out that step
and go on to:

>   my %seen = ();
>   my @new856 = ();

Instead of going through all fields in the record, you could go through the
856s you have gathered, add them to the %seen hash as usmarc (to facilitate
comparisons), and, as subsequent ones are already seen, delete the field.
After that, you could sort the fields, delete them, and then add back the
sorted fields.

  if (@m856) {
 foreach $f (@m856) {
   #add this field to seen fields if not seen
   unless ($seen{$f->as_usmarc}){
  $seen{$f->as_usmarc} = $f;
   }#unless seen this field's exact data
   else {
  #seen it, so delete current
 $record->delete_field($f);
   } #else seen this field
 } #foreach 856

my @new856 = (); #add values of %seen, sorted according to keys of %seen
###sort remaining/deduplicated 856 fields, delete existing fields, and then
add sorted fields back.
###where @new856 contains the values of %seen, sorted according to the keys
of %seen

   $record->insert_fields_ordered( @new856 );

#

I hope this helps,

Bryan Baldus
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://home.inwave.com/eija
 



Re: Removing duplicate fields with MARC::record

2007-07-30 Thread Mike Rylander
MUST ... RESIST ... URGE ... FOR ... ONEUPMANSHIP!!!

AAAH!!!  ;)

...

while (my $record = $batch->next) {
  # find 'em
  my @m856 = $record->field('856');

  # get rid of 'em
  $record->delete_field( $_ ) for @m856;

  # map to a hash for direct uniqueness
  %u856 = (map { ($_->as_usmarc => $_) } @m856);

  # then add 'em back
  $record->insert_fields_ordered( values( %u56) );
}



# sorry for the top-posting... and it's untested :)
__END__

--miker


On 7/30/07, Bryan Baldus <[EMAIL PROTECTED]> wrote:
> Note: my comments are untested and may not work without modification. Some
> parts left to the reader to complete.
>
> On Monday, July 30, 2007 2:16 PM, Michael Bowden wrote:
> >   @m856 = sort {$a cmp $b} @m856;
>
> @m856 has MARC::Field objects. Comparing them as such are unlikely to
> produce desired results.
> better might be @m856 = sort {$a->as_usmarc() cmp $b->as_usmarc()} @m856,
> but then you lose the field object. Better might be to leave out that step
> and go on to:
>
> >   my %seen = ();
> >   my @new856 = ();
>
> Instead of going through all fields in the record, you could go through the
> 856s you have gathered, add them to the %seen hash as usmarc (to facilitate
> comparisons), and, as subsequent ones are already seen, delete the field.
> After that, you could sort the fields, delete them, and then add back the
> sorted fields.
>
>   if (@m856) {
>  foreach $f (@m856) {
>#add this field to seen fields if not seen
>unless ($seen{$f->as_usmarc}){
>   $seen{$f->as_usmarc} = $f;
>}#unless seen this field's exact data
>else {
>   #seen it, so delete current
>  $record->delete_field($f);
>} #else seen this field
>  } #foreach 856
>
> my @new856 = (); #add values of %seen, sorted according to keys of %seen
> ###sort remaining/deduplicated 856 fields, delete existing fields, and then
> add sorted fields back.
> ###where @new856 contains the values of %seen, sorted according to the keys
> of %seen
>
>$record->insert_fields_ordered( @new856 );
>
> #
>
> I hope this helps,
>
> Bryan Baldus
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> http://home.inwave.com/eija
>
>
>