Re: MARC-perl: different versions yield different results

Leif Andersson Tue, 12 Oct 2010 09:05:48 -0700

Hi Ed,

Yes I ment that the drawback is in modifying a CPAN module locally.
Actually, I don't know if there are any undesireable side effects.
None that I know of - I have myself used this technique for almost three years 
now.

The idea is that the MARC::Record object per se should be just binary.
The efforts made in the leap from 1.38 to 2.0.0  to treat this blob as an 
(always well formed!) utf8 string, was a mistake in my eyes.

It has resulted in at least two common problems.
1. when writing records: the leader length / corrupted utf8 problem I responed 
to in my post.
2. when reading bad utf8 records: special care has to be taken so that not your 
whole application just dies at that record

Almost all postings to this forum since 2.0.0 has been concerned with one of 
these problems.
(exaggregating a little, but not much)

To put in "use bytes" is a shortcut instead of rewriting a whole bunch of code, 
which probably is esthetically more pleasing.
But it is obviously much more work...

And by the way, the second problem can be dealt with by changing
sub MARC::File::Encode::marc_to_utf8 {
    return Encode::decode( 'UTF-8', $_[0], 0 );  # do NOT check if UTF-8 is 
valid!
}

Yes, that is also a hack!

To sum up.
I think it is a good idea to make the MARC blob a binary object, so to speak.
I don't know if you should just apply my simple hacks to CPAN code.
Or if it is called for a thourough re-write of some parts of the modules.

Those changes may involve some changes in coding styles in the scripts that use 
MARC::Record.
But probably all you have to do is to remove all that strange code you put in 
there as workarounds to the character "bugs".

And yes, I have been using MARC::Charset in combination with this technique, 
without any problems that I can recall. :-)

/Leif

________________________________________
Från: ed.summ...@gmail.com [ed.summ...@gmail.com] f&#246;r Ed Summers 
[...@pobox.com]
Skickat: den 12 oktober 2010 17:13
Till: perl4lib@perl.org
Ämne: Re: MARC-perl: different versions yield different results

Hi Leif,

Is the downside to this approach that you are modifying a CPAN module
in place, or is it something to do with the behavior of 'use bytes'?
Would there be any undesirable side effects to adding 'use bytes' to
MARC::File::USMARC::encode on CPAN?

//Ed

On Tue, Oct 12, 2010 at 7:58 AM, Leif Andersson
<leif.anders...@sub.su.se> wrote:
> Myself I have changed one of the modules.
>
> MARC::File::USMARC
> It has a function called encode() around line 315
> I have added a "use bytes;" just before the final return. Like this:
>
> use bytes;
> return join("",$marc->leader, @$directory, END_OF_FIELD, @$fields, 
> END_OF_RECORD);
>
> To change directly in code like this is totally "no-no" to many programmers.
> If you feel uncomfortable with this, there are other methods doing the same 
> stuff.
> You could write a package:
>
> package MARC_Record_hack;
> use MARC::File::USMARC;
> no warnings 'redefine';
> sub MARC::File::USMARC::encode() {
>    my $marc = shift;
>    $marc = shift if (ref($marc)||$marc) =~ /^MARC::File/;
>    my ($fields,$directory,$reclen,$baseaddress) = 
> MARC::File::USMARC::_build_tag_directory($marc);
>    $marc->set_leader_lengths( $reclen, $baseaddress );
>    # Glomp it all together
>    use bytes;
>    return join("",$marc->leader, @$directory, "\x1E", @$fields, "\x1D");
> }
> use warnings;
> 1;
> __END__

Re: MARC-perl: different versions yield different results

Reply via email to